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Preface 


ASIACRYPT 2015, the 21st Annual International Conference on Theory and Appli- 
cation of Cryptology and Information Security, was held on the city campus of the 
University of Auckland, New Zealand, from November 29 to December 3, 2015. 
The conference focused on all technical aspects of cryptology, and was sponsored by 
the International Association for Cryptologic Research (IACR). 

The conference received 251 submissions from all over the world. The program 
included 64 papers selected from these submissions by a Program Committee 
(PC) comprising 43 leading experts of the field. In order to accommodate as many 
high-quality submissions as possible, the conference ran in two parallel sessions, and 
these two-volume proceedings contain the revised versions of the papers that were 
selected. The revised versions were not reviewed again and the authors are responsible 
for their contents. 

The selection of the papers was made through the usual double-blind review pro- 
cess. Each submission was assigned three reviewers and submissions by PC members 
were assigned five reviewers. The selection process was assisted by a total of 339 
external reviewers. Following the individual review phase, the selection process 
involved an extensive discussion phase. 

This year, the conference featured three invited talks. Phillip Rogaway gave the 
2015 IACR Distinguished Lecture on “The Moral Character of Cryptographic Work,” 
Gilles Barthe gave a talk on “Computer-Aided Cryptography: Status and Perspectives,” 
and Masayuki Abe spoke on “Structure-Preserving Cryptography.” The proceedings 
contain the abstracts of these talks. The conference also featured a traditional rump 
session that contained short presentations on the latest research results of the field. 

The best paper award was decided based on a vote by the PC members, and it was 
given to “Improved Security Proofs in Lattice-Based Cryptography: Using the Renyi 
Divergence Rather than the Statistical Distance” by Shi Bai, Adeline Langlois, Tan- 
crede Lepoint, Damien Stehle, and Ron Steinfeld. Two more papers, “Key-Recovery 
Attacks on ASASA” by Brice Minaud, Patrick Derbez, Pierre- Alain Fouque, and Pierre 
Karpman, and “The Tower Number Field Sieve” by Razvan Barbulescu, Pierrick 
Gaudry, and Thorsten Kleinjung, were solicited to submit full versions to the Journal 
of Cryptology. 

ASIACRYPT 2015 was made possible by the contributions of many people. We 
would like to thank the authors for submitting their research results to the conference. 
We are deeply grateful to all the PC members and all the external reviewers for their 
hard work to determine the program of the conference. We sincerely thank Steven 
Galbraith, the general chair of the conference, and the members of the local Organizing 
Committee for handling all the organizational work of the conference. We also thank 
Nigel Smart for organizing and chairing the rump session. 

We thank Shai Halevi for setting up and letting us use the IACR conference 
management software. Springer published the two-volume proceedings and made these 
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Preface 


available at the conference. We thank Alfred Hofmann, Anna Kramer, and their col- 
leagues for handling the editorial process. Last but not least, we thank the speakers, 
session chairs, and all the participants for coming to Auckland and contributing to 
ASIACRYPT 2015. 

December 2015 Tetsu Iwata 

Jung Hee Cheon 
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Structure-Preserving Cryptography 


Masayuki Abe 

NTT Secure Platform Laboratories, NTT Corporation, Tokyo, Japan 
abe. masayuki@lab . ntt. co. jp 


Bilinear groups has been a common ground for building cryptographic schemes since its 
introduction in seminal works [3, 5, 6]. Not just being useful for directly designing 
schemes for their rich mathematical structure, they aim to modular construction of 
complex schemes from simpler building blocks that work over the same bilienar groups. 
Namely, given a description of blinear groups, several building blocks exchange group 
elements each other, and the security of the resulting scheme is proven based on the 
security of the underlying building blocks. Unfortunately, things are not that easy in 
reality. Building blocks often require grues that bridge incompatible interfaces or they 
have to be modified to work together and the security has to be re-proved. 

Structure-preserving cryptography [2] is a paradigm for designing cryptographic 
schemes over bilinear groups. A cryptographic scheme is called structure preserving if its 
all public inputs and outputs consist of group elements of bilinear groups and the func- 
tional correctness can be verified only by computing group operations, testing group 
membership and evaluating pairing product equations. Due to the regulated interface, 
structure-preserving schemes are highly inter-operable as desired in modular construc- 
tions. In particular, combination of structure-preserving signatures and noninteractive 
proof system of [4] yields numerous applications that protect signers’ or receivers’ pri- 
vacy. The required properties on the other hand make some important primitives such as 
pseudo-random functions and collision resistant shrinking commitments unavailable in 
the world of structure-preserving cryptography. Interestingly, however, the constraints on 
the verification of correctness aim to argue non-trivial lower bounds in some aspects of 
efficiency such as signature size in the structure-preserving signature schemes. 

Since the first use of the term “structure-preserving” in [1] in 2010, intensive 
research has been done for the area. In this talk, we overview state of the art on several 
structure-preserving schemes including commitments and signatures with a careful 
look about underlying assumptions, known bounds, and impossibility results. We also 
show open questions and discuss promising directions for further research. 
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Computer-Aided Cryptography 
Status and Perspectives 


Gilles Barthe 

IMDEA Software Institute, Madrid, Spain 


Computer-aided cryptography is an emerging discipline which advocates the use of 
computer tools for building and mechanically verifying the security of cryptographic 
constructions. Computer-aided cryptography builds on the code-based game-based 
approach to cryptographic proofs, and adopts a program verification approach to justify 
common patterns of reasoning, such as equivalence up to bad, lazy sampling, or simply 
program equivalence. Technically, tools like EasyCrypt use a program verification 
method based on probabilistic couplings for reasoning about the relationship between 
two probabilistic programs, and standard tools to reason about the probability of events 
in a single probabilistic program. The combination of these tools, together with general 
mechanisms to instantiate or combine proofs, can be used to verify many examples 
from the literature. 

Recent developments in computer-aided cryptography have explored two different 
directions. On the one hand, several groups have developed fully automated techniques 
to analyze cryptographic constructions in the standard model or hardness assumptions 
in the generic group model. In turn, these tools have been used for synthesizing new 
cryptographic constructions. Transformational synthesis tools take as input a crypto- 
graphic construction, for instance a signature in Type I setting and outputs another 
construction, for instance a batch signature or a signature in Type III setting. In con- 
trast, generative synthesis tools take as input some size constraints and output a list of 
secure cryptographic constructions, for instance padding-based encryption schemes, 
modes of operations, or tweakable blockciphers, meeting the size constraints. On the 
other hand, several groups are working on carrying security proofs to (assembly-level) 
implementations, building on advances in programming languages, notably verified 
compilers. These works open the possibility to reason formally about mitigations used 
by cryptography implementers and to deliver strong mathematical guarantees, in the 
style of provable security, for cryptographic code against more realistic adversaries. 

For further background information, please consult: www.easycrypt.info. 


The Moral Character of Cryptographic Work 


Phillip Rogaway 1 

Department of Computer Science 
University of California, Davis, USA 


Abstract. Cryptography rearranges power: it configures who can do what, from 
what. This makes cryptography an inherently political tool, and it confers on the 
field an intrinsically moral dimension. The Snowden revelations motivate a 
reassessment of the political and moral positioning of cryptography. They lead 
one to ask if our inability to effectively address mass surveillance constitutes a 
failure of our field. I believe that it does. I call for a community-wide effort to 
develop more effective means to resist mass surveillance. I plea for a reinvention 
of our disciplinary culture to attend not only to puzzles and math, but, also, to 
the societal implications of our work. 

Keywords: Cryptography • Democracy • Ethics • Mass surveillance • Privacy • 
Snowden revelations ■ Social responsibility 


1 Work on the paper and talk associated to this abstract has been supported by NSF Grant CNS 
1228828. Many thanks to the NSF for their continuing support. 
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Abstract. The ASASA construction is a new design scheme introduced 
at Asiacrypt 2014 by Biruykov, Bouillaguet and Khovratovich. Its ver- 
satility was illustrated by building two public-key encryption schemes, a 
secret-key scheme, as well as super S-box subcomponents of a white-box 
scheme. However one of the two pub lie- key cryptosystems was recently 
broken at Crypto 2015 by Gilbert, Plut and Treger. As our main contri- 
bution, we propose a new algebraic key-recovery attack able to break at 
once the secret-key scheme as well as the remaining public- key scheme, in 
time complexity 2 63 and 2 39 respectively (the security parameter is 128 
bits in both cases). Furthermore, we present a second attack of indepen- 
dent interest on the same public-key scheme, which heuristically reduces 
its security to solving an LPN instance with tractable parameters. This 
allows key recovery in time complexity 2 56 . Finally, as a side result, we 
outline a very efficient heuristic attack on the white-box scheme, which 
breaks an instance claiming 64 bits of security under one minute on a 
single desktop computer. 


Keywords: ASASA • Algebraic cryptanalysis • Multivariate cryptogra- 
phy • LPN 


1 Introduction 

The idea of creating a public-key cryptosystem by obfuscating a secret-key cipher 
was proposed by Diffie and Heilman in 1976, in the same seminal paper that 
introduced the idea of public-key encryption [DH76]. While the RSA cryptosys- 
tem was introduced only a year later, creating a public-key scheme based on 
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symmetric components has remained an open challenge to this day. The interest 
of this problem is not merely historical: beside increasing the variety of available 
public-key schemes, one can hope that a solution may help bridging the perfor- 
mance gap between public-key and secret-key cryptosystems, or at least offer 
new trade-offs in that regard. 

Multivariate cryptography is one way to achieve this goal. This area of 
research dates back to the 1980’s [MI88,FD86], and has been particularly active 
in the late 1990’s and early 2000’s [Pat95,Pat96,RP97,FJ03, . . .]. Many of the 
proposed public-key cryptosystems build an encryption function from a struc- 
tured, easily invertible polynomial, which is then scrambled by affine maps (or 
similarly simple transformations) applied to its input and output to produce the 
encryption function. 

This approach might be aptly described as an ASA structure, which should be 
read as the composition of an affine map “A” , a nonlinear transformation of low 
algebraic degree “S” (not necessarily made up of smaller S-boxes), and another 
affine layer “A”. The secret key is the full description of the three maps A, S', A, 
which makes computing both ASA and ( ASA ) _1 easy. The public key is the 
function ASA as a whole, which is described in a generic manner by providing 
the polynomial expression of each output bit in the input bits (or group of n 
bits if the scheme operates on F 2 ™). Thus the owner of the secret key is able 
to encrypt and decrypt at high speed, depending on the structure of S. The 
downside is slow public key operations, and a large key size. 


The ASASA Construction. Historically, attempts to build public-key encryp- 
tion schemes based on the above principle have been ill-fated [FJ03,BFP11, 
DGS07,DFSS07, WBDY98, . . . J 1 . However several new ideas to build multivari- 
ate schemes were recently introduced by Biryukov, Bouillaguet and Khovra- 
tovich at Asiacrypt 2014 [BBK14]. The paradigm federating these ideas is 
the so-called ASASA structure: that is, combining two quadratic mappings S by 
interleaving random affine layers A. With quadratic S layers, the overall scheme 
has degree 4, so the polynomial description provided by the public key remains 
of reasonable size. 

This is very similar to the 2R scheme by Patarin [PG97], which fell vic- 
tim to several attacks [Bih00,DFKYZD99], including a powerful decomposition 
attack [DFKYZD99,FP06], later developed in a general context by Faugere et al 
[FvzGP10,FP09a,FP09b]. The general course of this attack is to differentiate the 
encryption function, and observe that the resulting polynomials in the input bits 
live in a “small” space entirely determined by the first ASA layers. This essen- 
tially allows the scheme to be broken down into its two ASA sub-components, 
which are easily analyzed once isolated. A later attempt to circumvent this and 
other attacks by truncating the output of the cipher proved insecure against 
the same technique [FP06] — roughly speaking truncating does not prevent the 
derivative polynomials from living in too small a space. 


HFEv- seems to be an exception in this regard. 
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In order to thwart attacks including the decomposition technique, the authors 
of [BBK14] propose to go in the opposite direction: instead of truncating the 
cipher, a perturbation is added, consisting in new random polynomials of degree 
four added at fixed positions, prior to the last affine layer 2 . The idea is that 
these new random polynomials will be spread over the whole output of the 
cipher by the last affine layer. When differentiating, the “noise” introduced by 
the perturbation polynomials is intended to drown out the information about 
the first quadratic layer otherwise carried by the derivative polynomials, and 
thus to foil the decomposition attack. 

Based on this idea, two public-key cryptosystems are proposed. One uses 
random quadratic expanding S-boxes as nonlinear components, while the other 
relies on the x function, most famous for its use in the SHA-3 winner Keccak. 
However the first scheme was broken at Crypto 2015 by a decomposition attack 
[GPT15]: the number of perturbation polynomials turned out to be too small 
to prevent this approach. This leaves open the question of the robustness of the 
other cryptosystem, based on to which we answer negatively. 

Black-Box ASASA. Besides public-key cryptosystems, the authors of [BBK14] 
also propose a secret-key (“black-box”) scheme based on the ASASA structure, 
showcasing its versatility. While the structure is the same, the context is entirely 
different. This black-box scheme is in fact the exact counterpart of the SASAS 
structure analyzed by Biryukov and Shamir [BS01] : it is a block cipher operating 
on 128-bit inputs; each affine layer is a random affine map on Z^ 28 , while the 
nonlinear layers are composed of 16 random 8-bit S-boxes. The secret key is the 
description of the three affine layers, together with the tables of all S-boxes. 

In some sense, the “public key” is still the encryption function as a whole; 
however it is only accessible in a black-box way through known or chosen- 
plaintext or ciphertext attacks, as any standard secret-key scheme. A major dif- 
ference however is that the encryption function can be easily distinguished from 
a random permutation because the constituent S-boxes have algebraic degree at 
most 7, and hence the whole function has degree at most 49; in particular, it 
sums up to zero over any cube of dimension 50. The security claim is that the 
secret key cannot be recovered, with a security parameter evaluated at 128 bits. 

White-Box ASASA. The structure of the black-box scheme is also used as a 
basis for several white-box proposals. In that setting, a symmetric (black-box) 
ASASA cipher with small block (e.g. 16 bits) is used as a super S-box in a design 
with a larger block. A white-box user is given the super S-box as a table. The 
secret information consists in a much more compact description of the super 
S-box in terms of alternating linear and nonlinear layers. The security of the 
ASASA design is then expected to prevent a white-box user from recovering the 
secret information. 


A similar idea was used in [Din04] . 
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1.1 Our Contribution 

Algebraic Attack on the Secret-Key and %-Based Public-Key Schemes. 

Despite the difference in nature between the y-based public-key scheme and the 
black-box scheme, we present a new algebraic key-recovery attack able to break 
both schemes at once. This attack does not rely on a decomposition technique. 
Instead, it may be regarded as exploiting the relatively low degree of the encryp- 
tion function, coupled with the low diffusion of nonlinear layers. Furthermore, in 
the case of the public-key scheme, the attack applies regardless of the amount of 
perturbation. Thus, contrary to the attack of [GPT15], there is no hope of patch- 
ing the scheme by increasing the number of perturbation polynomials. As for the 
secret-key scheme, our attack may be seen as a counterpart to the cryptanalysis 
of SASAS in [BS01], and is structural in the same sense. 

While the same attack applies to both schemes, their respective bottlenecks 
for the time complexity come from different stages of the attack. For the x 
scheme, the time complexity is dominated by the need to compute the kernel 
of a binary matrix of dimension 2 13 , which can be evaluated to 2 39 basic linear 
operations 3 * * . As for the black-box scheme, the time complexity is dominated by 
the need to encrypt 2 63 chosen plaintexts, and the data complexity follows. 

This attack actually only peels off the last linear layer of the scheme, reducing 
AS AS A to AS AS. In the case of the black-box scheme, the remaining layers can 
be recovered in negligible time using Biryukov and Shamir’s techniques [BS01]. 
In the case of the x scheme, removing the remaining layers poses non-trivial 
algorithmic challenges (such as how to efficiently recover quadratic polynomials 
A,B,C G Z 2 [X l, . . . , X n \/(Xf — Xi), given A + B • (7), and some of the algo- 
rithms we propose may be of independent interest. Nevertheless, in the end the 
remaining layers are peeled off and the secret key is recovered in time complexity 
negligible relative to the cost of removing the first layer. 


LPN-Based Attack on the % Scheme. As a second contribution, we present 
an entirely different attack, dedicated to the x public-key scheme. This attack 
exploits the fact that each bit at the output of x is “almost linear” in the input: 
indeed the nonlinear component of each bit is a single product, which is equal to 
zero with probability 3/4 over all inputs. Based on this property, we are able to 
heuristically reduce the problem of breaking the scheme to an LPN-like instance 
with easy-to-solve parameters. By LPN-like instance, we mean an instance of a 
problem very close to the Learning Parity with Noise problem (LPN), on which 
typical LPN-solving algorithms such as the Blum-Kalai-Wasserman algorithm 
(BKW) [BKW03] are expected to immediately apply. The time complexity of 
this approach is higher than the previous one, and can be evaluated at 2 56 basic 


3 In practice, vector instructions operating on 128-bit inputs would mean that the 

meaningful size of the matrix is 2 13-7 = 2 6 , and in this context the number of basic 
linear operations would be much lower. We also disregard asymptotic improvements 

such as the Strassen or Coppersmith- Winograd algorithms and their variants. The 

main point is that the time complexity is quite low — well within practical reach. 
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operations. However it showcases a different weakness of the x scheme, providing 
a different insight into the security of ASASA constructions. In this regard, it is 
noteworthy that the security of another recent multivariate scheme, presented 
by Huang et al. at PKC’12 [HLY12], was also reduced to an easy instance of 
LWE [Reg05], which is an extension of LPN, in [AFF+14] 4 . 

Heuristic Attack on the White-Box Scheme. Finally as a side result, 
we describe a key-recovery attack on white-box ASASA. The attack technique is 
unrelated to the previous ones, and its motivation relies on heuristics rather than 
a theoretical model. On the other hand it is very effective on the smallest white- 
box instances of [BBK14] (with a security level of 64 bits), which we break under 
a minute on a laptop computer. Thus it seems that the security level offered by 
small-block ASASA is much lower than anticipated. 

The same attack on white-box schemes was found independently by Dinur, 
Dunkelman, Kranz and Leander [DDKL15]. Their approach focuses on small- 
block ASASA instances, and is thus only applicable to the white-box scheme of 
[BBK14]. Section 5 of [DDKL15] is essentially the same attack as ours, minus 
some heuristic improvements (see [MDFK15]). On the other hand, the authors 
of [DDKL15] present other methods to attack small-block ASASA instances that 
are less reliant on heuristics, but as efficient as our heuristically improved variant, 
and thus provide a better theoretical basis for understanding small-block ASASA, 
as used in the white-box scheme of [BBK14]. 

1.2 Structure of the Article 

Section 3 provides a brief description of the three ASASA schemes under attack. 
In Sect. 4, we present our main attack, as applied to the secret-key (“black-box”) 
scheme. In particular, an overview of the attack is given in Sect. 4.1. The attack 
is then adapted to the x public-key scheme in Sect. 5.1, while the LPN-based 
attack on the same scheme is presented in Sect. 5.2. Finally, our attack on the 
white-box scheme is presented in Sect. 6. 


1.3 Implementation and Full Version 

Due to space constraints, some subordinate algorithms and proofs were removed 
from the print version of this article. However none of the missing material is 
essential to understanding the attacks. The full version is available on ePrint 
[MDFK15]. It is also available at the following link, together with implementa- 
tions of our attacks: 

https: / / www.dropbox.com /sh/ 3glwc5xl81fekre / AAASeG7D- CGKM2gLmr- UVBK9a 


4 On this topic, the authors of [BBK14] note that “the full application of LWE to 
multivariate cryptography is still to be explored in the future” . 
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2 Notation and Preliminaries 

The sign = denotes an equality by definition. |5| denotes the cardinality of a set 
S. The log() function denotes logarithm in base 2. 


Binary Vectors. We write Z 2 as a shorthand for Z/2Z. The set of n-bit vectors 
is denoted interchangeably by {0, l} n or Z 2 . However the vectors are always 
regarded as elements of Z 2 with respect to addition + and dot product (-|-). In 
particular, addition should be understood as bitwise XOR. The canonical basis of 
Z 2 is denoted by eo , . . . , e n -\. 

For any v G {0, l} n , Vi denotes the i-th coordinate of v. In this context, the 
index i is always computed modulo n, so vo = v n and so forth. Likewise, if F is 
a function mapping into {0, l} n , Fi denotes the i-th bit of the output of F. 

For a G {0, l} n , (F\a) is a shorthand for the function x 1 — > (F(x)\a). 

For any v G {0, l} n , [v\k denotes the truncation (vq, . . . , Vk-i) of v to its 
first k coordinates. 

For any bit 6, b stands for b+ 1. 


Derivative of a Binary Function. For F : {0, l} m — ► {0, l} n and 5 G {0, l} m , 
we define the derivative of F along S as dF/dS = xh F{x) +F(x + S). We write 
d d F/dv 0 . . . dvd-i = d (. . . (dF/dv 0 ) . . . )/dvd - 1 for the order-d derivative along 
^o, • • • , Vd~ 1 G {0, l} m . For convenience we may write F' instead of dF/dv when 
v is clear from the context; likewise for F" . 

The degree of Fi is its degree as an element of F 2 [^o, . . . , x rn -i]/(x‘f — xi) in 
the binary input variables. The degree of F is the maximum of the degrees of 
the Fi s. 


Cube. A cube of dimension d in {0, l} n is simply an affine subspace of dimen- 
sion d. The terminology comes from [DS09]. Note that summing a function F 
over a cube C of dimension d, i.e. computing J^ c ec^\ c )’ amounts to comput- 
ing the value of an order-d differential of F at a certain point: it is equal to 
d d F/dv 0 . . . dvd-i(a) for a, (vi) such that C = a + span{^ 0 , . . . , Vd~ 1 }. In par- 
ticular if F has degree d, then it sums up to zero over any cube of dimension 
d T 1. 


Bias. For any probability p G [0, 1], the bias of p is \2p — 1|. Note that the bias 
is sometimes defined as \p— 1/2 1 in the literature. Our choice of definition makes 
the formulation of the Piling-up Lemma more convenient [Mat94] : 

Lemma 1 (Piling- up Lemma). For X i,...,X n independent random binary 
variables with respective biases 61 , . . . , b n , the bias of X = ^Xi is b = Y\bi. 
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Learning Parity with Noise (LPN). The LPN problem was introduced in 
[BKW03], and may be stated as follows: given (A, As + e), find 5, where: 

- sGZ 2 i s a uniformly random secret vector. 

- A G Z^ xn is a uniformly random binary matrix. 

- e e Z? is an error vector, whose coordinates are chosen according to a 
Bernoulli distribution with parameter p. 

3 Description of ASASA schemes 

3.1 Presentation and Notations 

ASASA is a general design scheme for public or secret-key ciphers (or cipher 
components). An ASASA cipher is composed of 5 interleaved layers: the letter 
A represents an affine layer, and the letter S represents a nonlinear layer (not 
necessarily made up of smaller S-boxes). Thus the cipher may be pictured as: 



We borrow the notation of [GPT15] and write the encryption function F as: 

F = A z o S y o A y o S x o A x 

Moreover, x = (xo, . . . , x n _i) is used to denote the input of the cipher; x' is the 
output of the first affine layer A x ; and so on, as pictured above. The variables 
Hi, etc., will often be viewed as polynomials over the input bits (xo, . . . , x n _i). 
Similarly, F denotes the whole encryption function, while F y = S x o A x is the 
partial encryption function that maps the input x to the intermediate state y, 
and likewise F x ' = A x , F y = A y o S x o A x , etc. 

One secret-key (“black-box”) and two public-key ASASA ciphers are pre- 
sented in [BBK14]. The secret-key and public-key variants are quite different in 
nature, even though our main attack applies to both. We now present in turn the 
black-box and white-box constructions and the public- key variant based on y. 

3.2 Description of the Black-Box Scheme 

It is worth noting that the following ASASA scheme is the exact counterpart of 
the SASAS structure analyzed by Biryukov and Shamir [BS01], with swapped 
affine and S-box layers. 

Black-box ASASA is a secret-key encryption scheme, parameterized by m, 
the size of the S-boxes and /c, the number of S-boxes. Let n = km be the number 
of bits of the scheme. The overall structure of the cipher follows the ASASA 
construction, with layers as follows: 
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- A x ,A y ,A Z are a random invertible affine mappings > Z 2 . Without loss of 
generality, the mappings can be considered purely linear, because the affine 
constant can be integrated into the preceding or following S-box layer. In the 
remainder we assume the mappings to be linear. 

- S x ,S y are S-box layers. Each S-box layer consists in the application of k 
parallel random invertible m-bit S-boxes. 

All linear layers and all S-boxes are uniformly random among invertible elements, 
and independent from each other. 

In the concrete instance of [BBK14], each S-box layer contains k = 16 
S-boxes over m = 8 bits each, so that the scheme operates on blocks of n = 
128 bits. The secret key consists in three n-bit matrices and 2 k m-bit S-boxes, 
so the key size is 3 • n 2 + 2k • m2 m -bit long. With the previous parameters this 
amounts to 14 KB. 

It should be pointed out that the scheme is not IND-CPA secure. Indeed, an 
8-bit invertible S-box has algebraic degree (at most) 7, so the overall scheme has 
algebraic degree (at most) 49. Thus, the sum of ciphertexts on entries spanning a 
cube of dimension 50 is necessarily zero. As a result the security claim in [BBK14] 
is only that the secret key cannot be recovered, with a security parameter of 
128 bits. 

3.3 Description of the White-Box Scheme 

As an application of the symmetric ASASA scheme, Biryukov et al. propose its 
use as a basis for designing white-box block ciphers. In a nutshell, their idea is to 
use ASASA to create small ciphers of, say, 16-bit blocks and to use them as super 
S-boxes in e.g. a substitution-permutation network (SPN). Users of the cipher 
in the white-box model are given access to super S-boxes in the form a table, 
which allows them to encrypt and decrypt at will. Yet if the small ciphers used 
in building the super S-boxes are secure, one cannot efficiently recover their keys 
even when given access to their whole codebook, meaning that white-box users 
cannot extract a more compact description of the super S-boxes from their tables. 
This achieves weak white-box security as defined by Biryukov et al. [BBK14]: 

Definition 1 (Key Equivalence [BBK14]). Let E : {0,1}^ x {0, l} n — > 

{0, l} n be a (symmetric) block cipher. E(fc) is called the equivalent key set of k if 
for any k' G E(fc) one can efficiently compute E' such thatM p E(k,p) = E r (k',p). 

Definition 2 (Weak White-Box T-security [BBK14]). Let E : {0, 1} K x 

{0, l} n — > {0, l} n be a (symmetric) block cipher. W (E)(k,-) is said to be a 
T- secure weak white-box implementation of E(k, •) if\/p W(E)(k,p) = E(k,p) 
and if it is computationally expensive to find k' G E(fc) of length less than T bits 
when given full access to W (E)(k, •). 

Example 1. If Siq is a secure cipher with 16-bit blocks, then the full codebook 
of Sie(k, •) as a table is a 2 20 -secure weak white-box implementation of Sie(k, •). 
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For their instantiations, Biryukov et al. propose to use several super S-boxes 

of different sizes, among others: 

- A 16-bit ASASAie where the nonlinear permutations S are made of the parallel 
application of two 8-bit S-boxes, with conjectured security of 64 bits against 
key recovery. 

- A 20-bit ASASA20 where the nonlinear permutations S are made of the parallel 
application of two 10-bit S-boxes, with conjectured security of 100 bits against 
key recovery. 

- A 24-bit ASASA24 where the nonlinear permutations S are made of the parallel 
application of three 8-bit S-boxes, with conjectured security of 128 bits against 
key recovery. 


3.4 Description of the %-based Public-Key Scheme 

The x mapping was introduced by Daemen [Dae95] and later used for several 
cryptographic constructions, including the SHA-3 competition winner Keccak. 
The mapping x : {0? l} n — ► {0, l} n is defined by: 

Xi{a) = ai + a^la i+2 

The x-based ASASA scheme presented in [BBK14] is a public-key encryption 
scheme operating on 127-bit inputs, the odd size coming from the fact that x is 
only invertible on inputs of odd length. The encryption function may be written as: 

F = A z o (P + x° A y o x° A x ) 


where: 

- A x ,A y ,A z are random invertible affine mappings Z^ 27 — > Z3> 27 . In the remain- 
der we will decompose A x as a linear map L x followed by the addition of a 
constant C x , and likewise for A y ,A z . 

- x is as above. 

- P is the perturbation. It is a mapping {0, l} 127 — > {0, l} 127 . For 24 output 
bits at a fixed position, it is equal to a random polynomial of degree 4. On 
the remaining 103 bits, it is equal to zero. 

Since x has degree only 2, the overall degree of the encryption function is 4. 
The public key of the scheme is the encryption function itself, given in the form 
of degree 4 polynomials in the input bits, for each output bit. The private key 
is the triplet of affine maps (A x , A y , A z ). 

Due to the perturbation, the scheme is not actually invertible. To circumvent 
this, some redundancy is required in the plaintext, and the 24 bits of perturbation 
must be guessed during decryption. The correct guess is determined first by 
checking whether the resulting plaintext has the required redundancy, and second 
by recomputing the ciphertext from the tentative plaintext and checking that it 
matches. This is not relevant to our attack, and we refer the reader to [BBK14] 
for more information. 
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4 Structural Attack on Black-Box ASASA 

Our goal in this section is to recover the secret key of the black-box ASASA 
scheme, in a chosen-plaintext model. For this purpose, we begin by peeling off 
the last linear layer, A z . Once A z is removed, we obtain an AS AS structure, which 
can be broken using Biryukov and Shamir’s techniques [BS01] in negligible time. 
Thus the critical step is the first one. 


4.1 Attack Overview 

Before progressing further, it is important to observe that the secret key of the 
scheme is not uniquely defined. In particular, we are free to compose the input 
and output of any S-box with a linear mapping of our choosing, and use the 
result in place of the original S-box, as long as we modify the surrounding linear 
layers accordingly. Thus, S-boxes are essentially defined up to linear equivalence. 
When we claim to recover the secret key, this should be understood as recovering 
an equivalent secret key; that is, any secret key that results in an encryption 
function identical to the black-box instance under attack. 

In particular, in order to remove the last linear layer of the scheme, it is 
enough to determine, for each S-box, the m-dimensional subspace corresponding 
to its image through the last linear layer. Indeed, we are free to pick any basis of 
this m-dimensional subspace, and assert that each element of this basis is equal 
to one bit at the output of the S-box. This will be correct, up to composing the 
output of the S-box with some invertible linear mapping, and composing the 
input of the last linear layer with the inverse mapping; which has no bearing on 
the encryption output. 

Thus, peeling off A z amounts to finding the image space of each S-box 
through A z . For this purpose, we will look for linear masks a, b G {0, l} n over 
the output of the cipher, such that the two dot products (F\a) and (F\b) of the 
encryption function F along each mask are each equal to one bit at the output 
of the same S-box in the last nonlinear layer S y . Let us denote the set of such 
pairs (a, b) by S (as in “solution”). 

In order to compute <S, the core property at play is that if masks a and b are 
as required, then the binary product (F\a)(F\b) has degree only (m — l) 2 over 
the input variables of the cipher (meaning that (F\a)(F\b) sums to zero over any 
cube of dimension (m — l) 2 + 1), whereas it has degree 2(m — l) 2 in general. 

We define the two linear masks a and b we are looking for as two vec- 
tors of binary unknowns. Then /(a, b) = (F\a)(F\b) may be expressed as a 
quadratic polynomial over these unknowns, whose coefficients are (F\ei)(F\ej) 
for (ei) the canonical basis of Now, the fact that /(a, b) sums to zero over 
some cube C gives us a quadratic condition on (a, 6), whose coefficients are 
T, c ec(nc)\e i ){F{c)\e j ). 

By computing n(n — l)/2 cubes of dimension (m — l) 2 + 1, we thus derive 
n(n— 1)/2 quadratic conditions on (a, b). The resulting system can then be solved 
by relinearization. This yields the linear space K spanned by S. 
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However we want to recover <S, rather its linear combinations K. Thus in 
a second step, we compute S as S = K D P, where P is essentially the set of 
elements that stem from a single product of two masks a and b. While P is not 
a linear space, by guessing a few bits of the masks a, 6, we can get many linear 
constraints on the elements of P satisfying these guesses, and intersect these 
linear constraints with K. 

The first step may be regarded as the core of the attack, and it is also 
the computationally most expensive: essentially we need to encrypt plaintexts 
spanning n(n — l)/2 cubes of dimension (m — l) 2 + 1. We recall that in the 
actual black-box scheme of [BBK14], we have S-boxes over m = 8 bits, and the 
total block size is n — 128 bits, covered by k = 16 S-boxes, so the complexity 
is dominated by the computation of the encryption function over 2 13 cubes of 
dimension 50, i.e. 2 63 encryptions. 

4.2 Description of the Attack 

We use the notation of Sect. 3.1: let F = A z o S y o A y o S x o A x denote the 
encryption function. We are interested in linear masks a G {0, l} n such that 
(F\a) depends only on the output of one S-box. Since (F\a) = (S y o A y o S x o 
A X \(A Z ) T a ) , this is equivalent to saying that the active bits of ( A z ) T a span a 
single S-box. 

In fact we are searching for the set S of pairs of masks (a, b ) such that ( A z ) T a 
and ( A z ) T b span the same single S-box. Formally, if we let (eo, . . . , e n _i) be the 
canonical basis of , and let O t = spanje^ : mt < i < m(t + 1)} be the span of 
the output of the t-th S-box, then: 

S = {(a, b) G {0, l} n x {0, l} n : 3 £, ( A Z ) T a G O t and ( A z ) T b G O t } 

The core property exploited in the attack is that if (a, b) belongs to <S, then 
(F\a)(F\b) has degree at most (m — l) 2 , as shown by Lemma 2 below. On the 
other hand, if (a, b) 0 <S, then (F\a)(F\b) is akin to the product of two indepen- 
dent random polynomials of degree (m — l) 2 , and it reaches degree 2 (m — l) 2 
with overwhelming probability. 

Lemma 2. Let G be an invertible mapping {0, l} m — » {0, l} m for m > 2. For 
any two m-bit linear masks a and b, H = (G\a)(G\b) has degree at most m — 1. 

Proof. It is clear that the degree cannot exceed m, since we depend on only m 
variables (and we live in F 2 ). What we show is that it is less than m — 1, as long 
as m > 2. If a = 0 or b = 0 or a = 6, this is clear, so we can assume that a, b are 
linearly independent. Note that there is only one possible monomial of degree 
m, and its coefficient is equal to X^efo i}m H(x). So all we have to show is that 
this sum is zero. 

Because G is invertible, G(x) spans each value in {0, l} m once as x spans 
{0, l} m . As a consequence, the pair ((G|a), (G\b)) takes each of its 4 possible 
values an equal number of times. In particular, it takes the value (1,1) exactly 
1/4 of the time. Hence (G\a)(G\b) takes the value 1 exactly 2 m-2 times, which 
is even for m > 2. Thus 1 }™ H{x) =0 and we are done. □ 
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In the remainder, we regard two masks a and b as two sequences of n binary 
unknowns (ao, . . . , a n _i) and (bo, • • • , b n _i). 

Step 1: Kernel Computation. If a, b are as desired, (F\a)(F\b) has degree at 
most {m— l) 2 . Hence the sum of this product over a cube of dimension (m— 1) 2 + 1 
is zero, as this amounts to an order - (m — l) 2 + 1 differential of a degree (m — l) 2 
function. Let then C denote a random cube of dimension (m — l) 2 + 1 - that is, 
a random affine space of dimension (m — 1) 2 +1, over {0, l} n . We have: 

£<F(c)|a)<F(c)|&> = 

cEC cEC i<n j <n 

= E (E^( c )^( c )) a * & i 

i,j<n cEC 

= E (E Fi ( C ) F jl C ))( ai6 J + a 3 b i) 

i<j<n cEC 

To deduce the last line, notice that ^Z ceC FiFi = 0 since F has degree less 
than dim C. Since the equation above really only says something about afij +a,jbi 
rather than dibj (which is unavoidable, since the roles of a and b are symmetric), 
we define E = Z^( n-1 )/ 2 5 see its canonical basis as eij for i < j < n, and define 
A (a, 6) G E by: A (a, = afij + djbi. By convention we set A = A ij and 

A i r i = 0. The previous equations tells us that knowing only the n(n — l)/2 bits 
^2 ceC Fi(c)Fj( c ) yields a quadratic condition on (a, b), and more specifically a 
linear condition on A (a, b). Whence we proceed as follows: 


Algorithm 1: GenerateCondition 


1 

2 

3 

4 

5 


Input: A random cube C of dimension (m — l) 2 + 1 over {0, l} n 
Let sum = (0, . . . , 0) G E 
for c G C do 

(xo, • • -,X n -i) <— F(c) 
t (xiXj for i < j < n) G E 
sum — sum + t 


6 return sum 


Let M be a binary matrix of size (n 2 / 2) x (n(n — l)/2), whose rows are 
separate outputs of Algorithm 1. Let K be the kernel of this matrix. Then for 
all (a, b) G <S, A (a, b) is necessarily in K. Thus K contains the span of the A (a, b)’s 
for (a, b) G <S. Because M contains more than n(n — l)/2, with overwhelming 
probability K contains no other vector 5 . This is confirmed by our experiments. 

5 This point is the only reason we pick n 2 / 2 rows rather than only n(n — l)/2; but we 

may as easily choose n(n— l)/2 plus some small constant. In practice it we can just 
pick n(n — l)/2 rows, and add more as required until the kernel has the expected 
dimension km(m — l)/2. 
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Complexity Analysis. Overall, the dominant cost is to compute 2^ m_1 ^ 2+1 
encryptions per cube, for n 2 / 2 cubes, which amounts to a total of n 2 
encryptions. With the parameters of [BBK14], this is 2 63 encryptions. In prac- 
tice, we could limit ourselves to dimension - {m — l) 2 + 1 subcubes of a single 
dimension-(ra — l) 2 + 2 cube, which would cost only 2( m_1 ) +2 encryptions. 
However we would still need to sum (pairwise bit products of) ciphertexts for 
each subcube, so while this approach would certainly be an improvement in 
practice, we believe it is cleaner to simply state the complexity as n 2 2^ m_1 ^ 
encryption equivalents. 

Beside that, we also need to compute the kernel of a matrix of dimension 
n(n — l)/2, which incurs a cost of roughly n 6 /8 basic linear operations. With 
the parameters of [BBK14], we need to invert a binary matrix of dimension 2 13 , 
costing around 2 39 (in practice, highly optimized) operations, so this is negligible 
compared to the required number of encryptions. 


Step 2: Extracting Masks. Let: 

P = {A G E : 3 a, b G {0, l} n , A = A (a, b)} 

Clearly we have A (S) C K D P. In fact, we assume A (S) = K (1 P, which is 
confirmed by our experiments. We now want to compute K D P. 

However we do not need to enumerate the whole intersection KC\P directly: 
for our purpose, it suffices to recover enough elements of A (5) such that the 
corresponding masks span the output space of all S-boxes. Indeed, recall that 
our end goal is merely to find the image of all k S-boxes through the last linear 
layer. Thus, in the remainder, we explain how to find a random element in KDP. 
Once we have found km linearly independent masks in this manner, we will be 
done. 

The general idea to find a random element of K D P is as follows. We begin 
by guessing the value of a few pairs (a*, 6*). This yields linear constraints on 
the A^j’s. As an example, if (ao,&o) = (0,0), then Vi, Ao,* = 0. Because the 
constraints are linear and so is the space K , finding the elements of K satisfying 
the constraints only involves basic linear algebra. Thus, all we have to do is 
guess enough constraints to single out an element of S with constant probability, 
and recover that element as the one-dimensional subspace of K satisfying the 
constraints. 

More precisely, assume we guess 2 r bits of a, b as: 

(20: • • • 5 &V — 1 — b • • • 5 ^r — 1 
^0 5***5 — 1 = fioi • • • i fir— 1 

We view pairs (cq, fifij as elements of Z 2 . Assume there exists some linear depen- 
dency between the (c^,/^)’s: that is, for some {fifij G {0, l} r : 

r— 1 

^ (“»>&} = (o,o) 

2 = 0 
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Then for all j < n, we have: 

r— 1 r— 1 r— 1 

^ ^ = bj ^ ^ l^i^i + ^ ^ l^i^i = 0 ( 1 ) 

2 = 0 2 = 0 2=0 

Now, since Z 2 has dimension only 2, we can be sure that there exist r — 2 
independent linear relations between the (c^,/^)’s, from which we deduce as 
above (r — 2)n linear relations on the A^-’s. In the full version of this article (see 
Sect. 1.3), we prove that at least (r — 2 ){n — r ) of these relations are linearly 
independent. 

Now, the cardinality of S is k( 2 m — l)(2 m — 2) « k2 2m . Hence if we choose 
r = |_log(|«S|)/2j ~ m + ^ log &, and randomly guess the values of (a*, 6*) for 
i < r, then we can expect that with constant probability there exists exactly one 
element in S satisfying our guess. More precisely, each element has a probability 
(close to) 2 _2 LI <s I/ 2 J 2-1^1 of fitting our guess of 2 r bits, so this probability 

is close to \S\ (|<S| _1 (1 — | < S| _1 )l‘ s l _1 ) ~ 1/e. Thus, if we denote by T the sub- 
space of E of vectors satisfying the linear constraints induced by our guess, with 
probability roughly 1/3, A(5) D T contains a single element. 

On the other hand, K is generated by pairs of masks corresponding to distinct 
bits for each S-box in S y . Hence dim K = km(m— l)/2 = n(m— l)/2. As shown 
earlier, from our 2 r guesses, we deduce (at least) (r — 2 )(n — r ) linear conditions 
on the (A^j) 5 s, so codim T > (r — 2)(n — r). Since we chose r = m + \ log &, this 
means: 

codim T > (m — 2 + ^ log k) • (n — m — ^ log k) 
dim K = (m — 1) • (n/2) 

Thus, having \ log k > 1, i.e. k > 4, and m + \ log k > n/2, which is easily the 
case with concrete parameters m = 8, k = 16, n = 128, we have codim T > 
dim A/ and so K DT is not expected to contain any extra vector beside the span 
of A (5) DT. This is confirmed by our experiments. 

In summary, if we pick r = m + \ log k and randomly guess the first r pairs 
of bits (a*, bi), then with probability close to 1/e, K D T contains only a single 
vector, which belongs to A (<S) D T and in particular to A (<S). In practice it may 
be worthwhile to guess a little less then m + | log k pairs to ensure K D T is 
nonzero, then guess more as needed to single out a solution. Once we have a 
single element in A(S), it is easy to recover the two masks (a, b ) it stems from 6 . 

In the end, we recover two masks (a, b ) coming from the same S-box. If we 
repeat this process n = km times on average, the masks we recover will span 
the output of each S-box (indeed we recover 2 masks each time, so n tries is 
more than enough with high probability). Furthermore, checking whether two 
masks belong to the same S-box is very cheap (for two masks a, 6, we only need 
to check whether A (a, b) is in K ), so we recover the output space of each S-box. 

6 It can be shown that A is invertible except on its zero output, which is reached only 
when a = 0, 6 = 0 or a = 6. An inversion algorithm is given in the full version of 
this article (cf. Sect. 1.3). 
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Complexity Analysis. In order to get a random element in <S, each guess of 2 r 
bits yields roughly 1/3 chance of recovering an element by intersecting linear 
spaces K and T. Since K has dimension n{m — l)/2, the complexity is roughly 
(n(m — l)/2) 3 per try, and we need 3 tries on average for one success. Then the 
process must be repeated n times. Thus the complexity may be evaluated to 
roughly |n 4 (m — l) 3 basic linear operations. With the parameters of [BBK14], 
this amounts to 2 36 linear operations, so this step is negligible compared to 
Step 1 (and quite practical besides). 

Before closing this section, we note that our attack does not really depend 
on the randomness of the S-boxes or affine layers. All that is required of the 
S-boxes is that the degree of ZiZj vary depending on whether i and j belong to 
the same S-box. This makes the attack quite general, in the same sense as the 
structural attack of [BS01]. 

5 Attacks on the %-based Public-Key Scheme 

In this section, our goal is to recover the private key of the y-based ASASA 
scheme, using only the public key. For this purpose, we peel off one layer at a 
time, starting with the last affine layer A z . We actually propose two different 
ways to achieve this. The first attack is our main algebraic attack from Sect. 4, 
with some modifications to account for the peculiarity of x and the presence 
of the perturbation. It is presented in Sect. 5.1. The second attack reduces the 
problem to an instance of LPN, and is presented in Sect. 5.2. Once the last 
affine layer has been removed with either attack, we move on to attacking the 
remaining layers in Sect. 5.3. 


5.1 Algebraic Attack on the x Scheme 

The x scheme can be attacked in exactly the same manner as the black-box 
scheme in Sect. 4. Using the notations of Sect. 3.1, we have: 

ZiZi + 1 = (yl + y'i+Wi+2) ■ (y'i+i + ^+ 2 ^+ 3 ) 

= yWi+i + y'iv'i+^y'i+i 

Here the crucial point is that y' i+2 is shared by the only degree-4 term of both 
sides. Thus the degree of ZiZi+i is bounded by 6. Likewise, the degree of Zi+\(zi~\- 
Zi+ 2 ) = %i Zi+t + \Zi +2 is also bounded by 6, as the sum of two products 

of the previous form. On the other hand, any product of linear combinations 
(£ «<*<)(£ not of the previous two forms does not share common y'^s in 
its higher-degree terms, so no simplification occurs, and the product reaches 
degree 8 with overwhelming probability. 

As a result, we can proceed as in Sect. 4. Let n = 127 be the size of the 
scheme, p = 24 the number of perturbation polynomials. The positions of the 
p perturbation polynomials are not defined in the original paper; in the sequel 
we assume that they are next to each other. Other choices of positions increase 
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the tedium of the attack rather than its difficulty. A brief discussion of random 
positions for perturbation polynomials is offered in the full version of this arti- 
cle (see Sect. 1.3). Due to the rotational symmetry of y, the positions of the 
perturbed bits is only defined modulo rotational symmetry; for convenience, we 
assume that perturbed bits are at positions z n - p to z n -\+ 

The full attack presented below has been verified experimentally for small 
values of n. 

Step 1: Kernel Computation. We fill the rows of an n(n — l)/2 x n(n — l)/2 
matrix with separate outputs of Algorithm 1, with the difference that the dimen- 
sion of cubes in the algorithm is only 7 (instead of (m — l) 2 + 1 = 50 in the black- 
box case). Then we compute the kernel K of this matrix. Since n{n — l)/2 « 2 13 
the complexity of this step is roughly 2 39 basic linear operations. 

Step 2: Extracting Masks. The second step is to intersect K with the set P 
of elements of the form A (a, b ) to recover actual solutions (see Sect. 4, step 2). In 
Sect. 4 we were content with finding random elements of K D P. Now we want to 
find all of them. To do so, instead of guessing a few pairs (a^,^) as earlier, we 
exhaust all possibilities for (ao,&o) then (ai,bi) and so forth along a tree-based 
search. For each branch, we stop when the dimension of K intersected with the 
linear constraints stemming from our guesses of (a*,^)’s is reduced to 1. Each 
branch yields a solution A (a, 6), from which the two masks a and b can be easily 
recovered. 

Step 3: Sorting Masks. Let = (( L z ) T )~ 1 ei be the linear mask such that 
Zi = ( F\di ) (for the sake of clarity we first assume C z = 0; this has no impact 
on the attack until step 4 in Sect. 5.3 where we will recover C z ). At this point 
we have recovered the set S of all (unordered) pairs of masks {a*,a* + i} and 
{ai,ai - 1 + di+i} for i < n — p, i.e. such that the corresponding zi s are not 
perturbed. Now we want to distinguish masks a$_i + a^+i from masks a^. For 
each i such that i, z^ Zi±% are not perturbed, this is easy enough, as di appears 
exactly three times among unordered pairs in <S: namely in the pairs {a^, a^_ i } , 
{a^,a^ + 2 } and {a^,a^_i + Ui+i}; whereas masks of the form a$_i + a^ + i appear 
only once, in {a^-i + a*+i, a^}. 

Thus we have recovered every ai for which Zi-i, z^ Zi+i are not perturbed. 
Since perturbed bits are next to each other, we have recovered all unperturbed 
di s save the two a^s on the outer edge of the perturbation, i.e. ao and a n _ p _i. 
We can also order all recovered ai s simply by checking whether {a^, a^ + 1 } is in S. 
In other words, we look at S as the set of edges of a graph whose vertices are the 
elements of pairs in S ; then the chain (ai, . . . , a n _ p _ 2 ) is simply the longest path 
in this graph. In fact we recover (ai, . . . , a n _ p _ 2 ), minus its direction: that is, so 
far, we cannot distinguish it from (a n _ p _ 2 , . . . , d\). If we look at the neighbours 
of the end points of the path, we also recover {ao, ao+a 2 } and {a n _ p _i, a n _ p _3 + 
a n _ p _i}. However we are not equipped to tell apart the members of each pair 
with only S at our disposal. 
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To find ao in {ao,ao + <22} (and likewise a n _ p _ 2 in {a n _ p _i, a n _ p _3 + 
a n _ p _i}), a very efficient technique is to anticipate a little and use the dis- 
tinguisher in Sect. 5 . 2 . Namely, in short, we differentiate the encryption function 
F twice using two fixed random input differences 61 7^ S2, and check whether 
for a fraction 1/4 of possible choices of (61,62), (8 2 F / 86186 2\x) is equal to a 
constant with bias 2 -4 : this property holds if and only if x is one of the cq’s. 
This only requires around 2 16 encryptions for each choice of (^1,^2), and thus 
completes in negligible time. Another more self-contained approach is to move 
on to the next step (in Sect. 5 . 3 ), where the algorithm we use is executed sepa- 
rately on each recovered mask a$, and fails for ao + but not a\. However this 
would be slower in practice. 

We assume either solution was chosen and we now know the whole ordered 
chain (ao, . . . , a n _ p _i) of masks corresponding to unperturbed bits. At this 
stage we are only missing the direction of the chain, i.e. we cannot distinguish 
(ao, . . . , a n - p - 1) from (a n _ p _i, . . . , ao). This will be corrected at the next step. 

As mentioned earlier, we propose two different techniques to recover the 
first linear layer of the % scheme: one algebraic technique, and another based on 
LPN. We have now just completed the algebraic technique. In the next section we 
present the LPN-based technique. Afterwards we will move on to the remaining 
steps, which are common to both techniques, and fully break the cipher with 
the knowledge of (ao, . . . , a n _ p _ 1), in Sect. 5 . 3 . 

5.2 LPN-based attack on the % scheme 

We now present a different approach to remove the last linear layer of the x 
scheme. This approach relies on the fact that each output bit of x is almost 
linear, in the sense that the only nonlinear component is the product of two 
input bits. In particular this nonlinear component is zero with probability 3 / 4 . 
The idea is then to treat this nonlinear component as random noise. To achieve 
this we differentiate the encryption function F twice. So the first ASA layers of 
F" yield a constant; then AS AS is a noisy constant due to the weak nonlinearity; 
and ASASA is a noisy constant accessed through A z . This allows us to reduce the 
problem of recovering A z to (a close variant of) an LPN instance with tractable 
parameters. 

We now describe the attack in detail. First, pick two distinct random differ- 
ences 61,62 G { 0 , l} n . Then compute the order 2 differential of the encryption 
function along these two differences. That is, let F" = dF/ 861862- This second- 
order differential is constant at the output of F v = A y o x ° A x , since % has 
degree only two: 

(F y ')"(x) = 8Fy' / 861862 = C(6 1 ,6 2 ) 

Now if we look at a single bit at the output of F z = x 0 F y , we have: 

{F z )’l{x) = (. Fy'Y'(x ) + r) + F^FfUx + *1) 

+ F i+l F i+2( X + $2) + F i+l F i+2( x + < 5 l + £2) (2) 
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That is, a bit at the output of ( F z )" still sums up to a constant, plus the sum of 
four bit products. If we look at each product as an independent random binary 
variable that is zero with probability 3/4, i.e. bias 2 _1 , then by the Piling- up 
Lemma (Lemma 1) the sum is equal to zero with bias 2 -4 . 

Experiments show that modeling the four products as independent is not 
quite accurate: a significant discrepancy is introduced by the fact that the four 
inputs of the products sum up to a constant. For the sake of clarity, we will 
disregard this for now and pretend that the four products are independent. We 
will come back to this issue later on. 

Now a single linear layer remains between ( F z )" and F" . Let Si E {0, l} n 
be the linear mask such that (F\si) = F z (once again we assume C z = 0, and 
postpone taking C z into account until step 4 of the attack). Then (F"\si) is 
equal to a constant with bias 2 -4 . Now let us compute N different outputs of 
F" for some N to be determined later, which costs 42V calls to the encryption 
function F. Let us stack these N outputs in an N x n matrix A. 

Then we know that A-Si is either the all-zero or the all-one vector (depending 
on ( F v )'') plus a noise of bias 2 -4 . Thus finding si is essentially an LPN problem 
with dimension n = 127 and bias 2 -4 (i.e. noise 1/2 + 2 -5 ). Of course this is not 
quite an LPN instance: A is not uniform, there are n solutions instead of one, and 
there is no output vector b (although we could isolate the last column of A and 
define it as the output vector). However in practice none of this should hinder 
the performance of a BKW algorithm [BKW03]. Thus we make the heuristic 
assumption that BKW performs here as it would on a standard LPN instance 7 * * * * * * . 

In the end, we recover the masks Si such that z\ — ( F\si ). Before moving on to 
the next stage of the attack, we go back to the earlier independence assumption. 


Dependency Between the Four Products. In the reasoning above, we have 
modeled the four bit products in Eq. 2 as independent binary random variables 
with bias 2 _1 . That is, we assumed the four products would behave as: 


n = WiW 2 + x x x 2 + YiY 2 + ZiZ 2 


where W, W,T^, Z{ are uniformly random independent binary variables. This 
yields an expectancy E[77] with bias 2 -4 . As noted above, this is not quite 
accurate, and we now provide a more precise model that matches with our 
experiments. 


7 To the best of our knowledge, we have yet to see an LPN-like problem with a matrix A 

on which BKW underperforms significantly compared to the uniform case, unless the 

problem was specifically crafted for this purpose. The existence of multiple solutions 

is also a notable difference in our case. However in a classic application of BKW with 

a fast Fourier transform at the end, this only means that the Fourier transform will 

output several solutions. Note that the dimension of the Fourier transform will be 

close to 127/3 ~ 42 [LF06], and we have only « 2 14 solutions, so they are distinct 

on their last 42 bits with very high probability. 
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Since F y ' has degree two, ( F y is a constant, dependent only on 8\ and S 2 . 
This implies that in the previous formula, we have Wi +X\ +Y\ + Z\ = ( F y )" +1 
and W 2 + X 2 + Y 2 + Z 2 = (F y ')" +2 . To capture this, we look at: 

F(a,c 2 ) = E[J7 | W 1 +X 1 + Y 1 + Z 1 = c 1 ,W 2 + X 2 + Y 2 + Z 2 = c 2 ] 

It turns out that ^(0,0) has a stronger bias, close to 2 -3 ; while perhaps sur- 
prisingly, E(a , b ) for (a, 6) 7^ (0, 0) has bias zero, and is thus not suitable for our 
attack. Since G" is essentially random, this means that our technique will work 
for only a fraction 1/4 of output bits. However, once we have recovered these 
output bits, we can easily change 81,82 to obtain a new value of G" and start 
over to find new output bits. 

After k iterations of the above process, a given bit at position i < 127 will 
have probability (3/4 ) k of remaining undiscovered. In order for all 103 unper- 
turbed bits to be discovered with good probability, it is thus enough to perform 
k = — log(103)/log(3/4) « 16 iterations. 

In the end we recover all linear masks a^ corresponding to unperturbed bits at 
the output of the second x layer; i.e. cq = (( A z ) T )~ 1 ei for 0 < i < n—p. The a^s 
can then be ordered into a chain (ao, . . . , a n _ p _ 1) like in Sect. 5.1: neighbouring 
a,i s are characterized by the fact that (F\di) (F\di+i) has degree 6. We postpone 
distinguishing between (ao, . . . , a n _ p _ 1) and (a n _ p _i, . . . , ao) until Sect. 5.3. 

Complexity Analysis. According to [LF06, Theorem 2], the number of samples 
needed to solve an LPN instance of dimension 127 and bias 2 -4 is N = 2 44 
(attained by setting a = 3 and b = 43). This requires AN = 2 46 encryptions. 
Moreover the dominant cost in the time complexity is to sort the 2 44 samples a 
times, which requires roughly 3 • 44 • 2 44 < 2 52 basic operations. Finally, as noted 
above, we need to iterate the process 16 times to recover all unperturbed output 
bits with good probability, so our overall time complexity is increased to 2 56 
for BKW, and 2 50 encryptions to gather samples (slightly less with a structure 
sharing some plaintexts between the 16 iterations). 

5.3 Peeling Off the Remaining ASAS layers 

Using either the algebraic attack from Sect. 5.1 or the LPN-based attack from 
Sect. 5.2, we have recovered the ordered chain (ao, . . . , a n _ p _ 1) of linear masks 
such that Zi == (F|a^). More exactly we have recovered either (ao, . . . , a n _ p _ 1) 
or (a n _ p _i, . . . , ao). For simplicity assume we have recovered (ao, . . . , a n _ p _ 1). 
We will be able to distinguish between the two cases later on. 

Essentially, this means we have peeled off the last affine layer A z — or more 
accurately, its linear component, over the unperturbed bits. Note that we can- 
not hope to recover A z over perturbed bits, as perturbed bits are by definition 
uniformly random polynomials of degree 4, and a linear combination of uni- 
formly random polynomials of degree 4 is still a uniformly random polynomial 
of degree 4. In other words, the perturbation is essentially defined modulo affine 
equivalence. 
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We now move on to peeling off the remaining layers one by one. We point 
out once again that all steps below have been verified experimentally. 


Step 4: from ASAS to ASA. The next layer we wish to peel off is a x layer, 
which is entirely public. It may seem that applying y -1 should be enough. The 
difficulty arises from the fact that we do not know the full output of y, but 
only n — p bits. Furthermore, if our goal was merely to decrypt some specific 
ciphertext, we could use other techniques, e.g. the fact that guessing one bit at 
the input of x produces a cascade effect that allows recovery of all other input 
bits from output bits, regardless of the fact that the function has been truncated 
[Dae95]. However our goal is different: we want to recover the secret key, not just 
be able to decrypt messages. For this purpose we want to cleanly recover the 
input of x in the form of degree 2 polynomials, for every unperturbed bit. We 
propose a technique to achieve this below. 

From the previous step, we are in possession of (ao, . . . ,a n _ p _i) as defined 
above. Since by definition Zi = (F\di), this means we know Zi for 0 < i < n — p. 
Note that y[ has degree only 2, and we know that Zi = y[ + ^ + i^ +2 - In order to 
reverse the x layer, we set out to recover ^,^ +1 ,^ +2 f rom knowledge of only 
Zi, by using the fact that ^,^ +1 ,^ +2 are quadratic. 

This reduces to the following problem: given P = A+B-C, where A, B , C are 
degree-2 polynomials, recover A, B , C. A closer look reveals that this problem 
is not possible exactly as stated, because P can be equivalently written in four 
different ways as: A T B • C , A T B T B • C , A T C T B • C , A T B T C T B • C . On 
the other hand, we assume that for uniformly random A, B, C, the probability 
that P may be written in some unrelated way, i.e. P = C + D • E for C, D, E 
distinct from the previous four cases, is overwhelmingly low. This situation has 
never occurred in our experiments. Thus our problem reduces to: 

Problem 1. Given P = A+B-C, where A , B , C are degree-2 polynomials, recover 
degree-2 polynomials A! , B' , C' such that P = A! + B f • C' . 

Our previous assumption says A! G span{A, F>, C, 1}; B',C f G span{F>, C, 1}. 
A straightforward approach to tackle this problem is to write B formally as 
a generic degree-2 polynomial with unknown coefficients. This gives us k = 
1 + n + n(n + l)/2 ~ n 2 / 2 binary unknowns. Then we observe that B • P has 
degree only 4 (since B 2 = B). Each term of degree 5 in B P must have a 
zero coefficient, and thus each term gives us a linear constraint on the unknown 
coefficients of B. Collecting the constraints takes up negligible time, at which 
point we have a k x k matrix whose kernel is span{H, C, 1}. This gives us a 
few possibilities for B ' , C" , which we can filter by checking that A' = P — B' • 
C' has degree 2. The complexity of this approach boils down to inverting a 
/c-dimensional binary matrix, which costs essentially 2 3fc basic linear operations. 
In our case this amounts to 2 39 basic linear operations. In the full version of this 
article (cf. Sect. 1.3), we present a more elaborate, but faster algorithm to solve 
Problem 1. 
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At this point, we have essentially removed the first two ASASA layers (assum- 
ing C z = 0, but this actually has no impact up to this point). More work is 
required to fully recover the layers, and analyze the remaining ASA layers. How- 
ever the core of the attack is over. A detailed description of the remaining steps 
to fully recover the remaining layers is provided in the full version of this article 
(see Sect. 1.3). 

6 A Practical Attack on White-Box ASASA 

In this section we show that the actual security of small-block ASASA ciphers is 
much lower than was estimated by Biryukov et al. We describe a procedure that 
attempts to recover the secret components of the structure, thus breaking the 
weak white-box security notion (Definition 2). Our algorithm relies rather heavily 
on heuristics, and evaluating its efficiency requires actual implementation. We 
focused on two instance, the 16-bit ASASAie with claimed security of 64 bits 
and the 20-bit ASASA 20 with claimed security of 100 bits. A straightforward 
implementation of our algorithm is able to recover the secret components of the 
16-bit instance in under a minute and of the 20-bit instance in a few hours, when 
running on a standard PC. We recall that the source code is publicly available 
(see Sect. 1.3). For the remainder of the section, we implicitly use the 16-bit 
instance when describing the attack. 


6.1 Attack Overview 

Our general black-box attack from Sect. 4 does not apply, because the block 
size is too small to allow computing cubes of dimension 50. On the other hand, 
the small block size makes it possible to compute the distribution of output 
differences for a single input difference in very reasonable time. For instance, 
one can compute and store the entire difference distribution table (DDT) of a 
16-bit cipher in under a second using just a standard PC. 

Remark 1. Our attack makes use of the full codebook of the ciphers, which in 
general may be seen as a very strong requirement. This is however only natural in 
the case of attacking white-box implementations, as the user is actually required 
to be given the full codebook of the super S-boxes as part of the implementation. 

From the results of Biryukov and Shamir [BS01], it is already enough to recover 
only one of the external affine (or linear) layers in order to break the security 
of ASASA. Indeed, this allows to reduce the cipher to either of ASAS or SASA, 
which can then be attacked in practical time using their method. Thus we focus 
on removing the first linear layer. In accordance with the opening remarks of 
Sect. 4.1, this amounts to finding the image space of each S-box through ( A x )~ 1 . 

The general idea of the attack is to create an oracle able to recognize whether 
an input difference S activates one or two S-boxes in the first S-box layer S x . 
More accurately, we create a ranking function T such that T[S) is expected to 
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be significantly higher if S activates only one S-box rather than two. We propose 
two choices for T. 

Both choices begin by computing the entire output difference distribution 
D(S) for the input difference 5 , i.e. the row corresponding to S in the DDT. 
Then the value of J-’(S) is computed from D(S). Choices for T are heuristic, but 
experiments show they are quite efficient. We now present our two choices for T. 


Walsh Transform. The idea behind this version of the attack is quite intuitive. 
If S activates only one S-box, then after the first SA layers, two inner states 
computed from any two plaintexts with input difference S are equal on the output 
of the inactive S-box. Hence after the first ASA layers, they are equal along 2 8 — 1 
non-zero linear masks. Since these masks only traverse a single S-box layer before 
the output of the cipher, linear cryptanalysis [Mat94] tells us that we can expect 
some linear masks to be biased at the output of the cipher. On the other hand 
if both S-boxes are active in the first round, no such phenomenon occurs, and 
linear biases on the output differences are expected to be weaker. 

In order to measure this difference, we propose to compute, for every output 
mask a, the value /(a) = (X^efo i} 16 (dFdS(x)\a)) — 2 15 (where the sum is 
computed in Z). That is, 2“ 15 /(a) is the bias of the output differences D(S) 
along mask a. The function / can be computed efficiently, since it is precisely 
the Walsh transform of the characteristic function of D(5), and we can use a fast 
Fourier transform algorithm. Then as a ranking function T we simply choose 
max(/), i.e. the highest bias among all output masks. 


Number of Collisions. It turns out that performing the Walsh transform is 
not truly necessary. Indeed, the number of collisions in D(5) is higher when S 
activates only 1 S-box; where by number of collisions we mean 2 15 minus the 
number of distinct values in D(S). This may be understood as a consequence 
of the fact that whenever S activates a single S-box, only 2 7 output differences 
are possible after the first ASA layers; and depending on the properties of the 
active (random) S-box, the distribution between these differences may be quite 
uneven. Whereas if both S-boxes are active, 2 15 differences are possible and the 
distribution is expected to be less skewed. Thus we pick as ranking function T 
the number of collisions in D(S) in the previous sense. 

Once we have chosen a ranking function T, we simply compute the ranking 
of every possible input difference, sort the differences, and choose the highest 16 
linearly independent differences according to our ranking. Our hope is that these 
differences only activate a single S-box. In a second step, we will group together 
differences that activate the same S-box. A more detailed description of the 
attack, together with a discussion of the results, is provided in the full version 
of this article (see Sect. 1.3). 
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7 Conclusion 

We presented a new algebraic attack able to efficiently break both the x _ k ase d 
public- key cryptosystem and the secret-key scheme of [BBK14]. In addition we 
proposed another attack that heuristically reduces the key-recovery problem on 
the x scheme to an easy instance of LPN. In the case of the public-key scheme, 
both attacks go through regardless of the amount of perturbation. For both 
schemes, the attacks are quite structural (in the case of the black-box scheme, it 
is in fact structural in the sense of [BS01] ) , and seem difficult to patch. Finally, 
although the general attack on the black-box scheme does not carry over to the 
small-block instances used for white-bow designs, we also showed a very efficient 
dedicated attack on some of the small-block instances, casting a doubt on their 
general suitability for that purpose. 
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Abstract. The security of pairing-based crypto-systems relies on the 
difficulty to compute discrete logarithms in finite fields F p n where n is 
a small integer larger than 1. The state-of-art algorithm is the number 
field sieve (NFS) together with its many variants. When p has a special 
form (SNFS), as in many pairings constructions, NFS has a faster vari- 
ant due to Joux and Pierrot. We present a new NFS variant for SNFS 
computations, which is better for some cryptographically relevant cases, 
according to a precise comparison of norm sizes. The new algorithm is an 
adaptation of Schirokauer’s variant of NFS based on tower extensions, 
for which we give a middlebrow presentation. 


Keywords: Discrete logarithm • Number field sieve • Pairings 


1 Introduction 

The discrete logarithm problem (DLP) in finite fields is a central topic in public 
key cryptography. The case of ¥ p n where p is prime and n is a small integer 
greater than 1, albeit less studied than the prime case, is at the foundation 
of pairing-based cryptography. The number field sieve (NFS) started life as a 
factoring algorithm but was rapidly extended to compute discrete logarithms in 
[19,20,33] and has today a large number of variants. In 2000 Schirokauer [34] 
proposed the tower number field sieve (TNFS), as the first variant of NFS to 
solve DLP in fields F p n with n > 1. When n is fixed and the field cardinality 
Q = p n tends to infinity, he showed that TNFS has the heuristic complexity 
Lq( 1/3, - 5 / 64/9), where 

Lq{u,c) = exp ((c + o(l))(logQ)“(loglogQ) 1_ “) . 

Schirokauer explicitly suggested that his algorithm might be extended to arbi- 
trary fields F p n with p = L p n{a,c) and a > 2/3, while maintaining the same 
complexity. Another question that he raised was whether his algorithm could 
take advantage of a situation where the prime p has a special SNFS shape, 
namely if it can be written p = P{u ) for an integer u ~ p x ! d and a polynomial 
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P G 1i[x\ of degree d, with coefficients bounded by an absolute constant. By that 
time, even for prime fields the answer was not obvious. 

In 2006 Joux, Lercier, Smart and Vercauteren [21] presented a new variant 
of NFS which applies to all finite fields F p n with p = Lq(ol , c) for some a > 1/3 
and c > 0, the JLSV algorithm. When a > 2/3, their variant has complex- 
ity Lq(1/3, 'y/ 64/9). The question of extending TNFS to arbitrary finite fields 
became obsolete, because, in case of a positive answer, it would have the same 
complexity as the JLSV algorithm. 

In 2013 Joux and Pierrot [22] designed another variant of NFS which applies 
to non-prime fields ¥ p n where p is an SNFS prime. Their algorithm has com- 
plexity Lq (1/3, y/32/9), which is the same as that of Semaev’s SNFS algorithm 
for prime fields [35]. It shows that the pairing-based crypto-systems which use 
primes of a special form are more vulnerable to NFS attacks than the gen- 
eral ones. With this SNFS algorithm, the second question of Schirokauer lost its 
appeal as well, because this is the complexity that one can expect if Schirokauer’s 
algorithm can be adapted when p is an SNFS prime. 

In 2014 Barbulescu, Gaudry, Guillevic and Morain improved the algorithm 
in [21] and set a record computation in a field ¥ p 2 of 180 decimal digits. However, 
since their improvements do not apply to SNFS fields and since the algorithm 
of Joux and Pierrot was never implemented, it is important to find a practical 
algorithm for this case. 

In this work, we wish to rehabilitate Schirokauer’s TNFS algorithm. First, 
we show that indeed, the heuristic complexity carries over to the expected range 
of finite fields. In order to make this analysis, we restate the original TNFS with 
less technicalities than in the original presentation, taking advantage of tools 
that were invented later (virtual logarithms). 

We also show that for extension fields based on SNFS primes, the complexity 
of TNFS drops as expected to Lq(1/3, ^/32/9). 

Finally, going beyond the asymptotic formulae, we compute estimates that 
strongly suggest that TNFS is currently the most efficient algorithm for solving 
discrete logarithms in small degree extensions of SNFS prime fields, like the ones 
arising naturally in several pairing constructions. 

Outline. After a brief description of Schirokauer’s TNFS algorithm in Sect. 2, 
we present it with sufficiently many details to get a proper asymptotic analysis 
in Sect. 3. In Sect. 4, several variants are described and analyzed, in particular 
the SNFS variant. This is followed, in Sect. 5 by more precise estimates for 
cryptographically relevant sizes and comparisons with other methods. Further 
technicalities about TNFS are given in an appendix; these are mostly details that 
could be useful for an implementation but which do not change the complexities. 

2 Overview of TNFS 

To fix ideas, we consider the case of “large” characteristic, so that we target 
fields ¥q with Q = p n so that p = Lg(cqc) for some constants a > 2/3 and 
c > 0. 
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Pohlig and Heilman explained how to retrieve the discrete logarithm modulo 
the group order N from the value of the discrete logarithms modulo each prime 
factor t of N. Furthermore, Pollard’s rho algorithm allows to compute discrete 
logarithms for small primes. Hence it is enough to explain how to use NFS 
to compute discrete logarithms modulo prime factors £ of #F* n larger that 
L p n (1/3, c) for some c > 0. 

A classical variant of the NFS algorithm, e.g. one of the variants used for 
factoring and DLP in prime fields, would involve two irreducible polynomials / 
and g in Z[x] which have a common irreducible factor of degree n modulo p. 
Here, in TNFS, we consider two polynomials / and g defined over a ring R 
which is of the form R = Z [t\/(h(t)) for a monic irreducible polynomial h of 
degree n. We ask furthermore that h remains irreducible modulo p, so that there 
is a unique ideal p above p in R. Finally, we require that / and g are irreducible 
over Q[t\/(h(t)) and have a common root modulo p in R. 

In the rest of the article, we denote by Kf the number field Kf K g 

Kf defined by /, and by K g the one defined by g. Also we write 
Q(l) for the number field defined by ft, so that Kf and K g are 
as in the figure aside. 

The conditions imposed on /, g and h are such that there Q(0 

exist two ring homomorphisms from R[x\ to R/p = F p n, one ^ 

going through R[x]/f(x), and the other through R[x]/g(x), and 
for any polynomial in R[x], the resulting values in F p n coincide, (Q) 

so that we get a commutative diagram as in the classical NFS 
algorithm. In Fig. 1, we recall this diagram, where we have denoted by a f (resp. 
a g ) a root of / (resp. of g) and by m the common root of / and g modulo p 
in R. These notations will be used all along the article. 

Among the constructions that we tried, the best one uses polynomials / and 
g with coefficients in Z, so that Kf and K g can also be seen as compositum of 
two fields. If one could find a construction where / and g have coefficients in R 
one might find a faster algorithm. In any case, it is interesting to consider / and 



R[x\ 


Kf D R[x\/(f(x)) R[x]/(g(x)) c K 





Fig. 1 . Commutative diagram of TNFS for discrete logartihm in . In the classical 
case, R — 7L\ here R — Z[l\ is a subring of a number field of degree n where p is inert. 
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g as polynomials in R[x], since this makes it easier to follow the analogy with 
the classical NFS. 

Once this setting is done, the TNFS algorithm proceeds as usual. For many 
polynomials a(t) — b(t)x in R[x\, we consider their two images in R[x\/f(x) 
and R[x\/g(x ), and test them for smoothness as ideals. Each time the images 
are simultaneously smooth, we can write a relation: modulo the usual compli- 
cations with principality defects and units that can be handled with the help 
of Schirokauer maps, it is possible to convert a relation into a linear relation 
between virtual logarithms of the factor base elements. Then follows a sparse 
linear algebra step to deduce the values of these virtual logarithms. And finally, 
the logarithm of an individual element of F p n can be computed using a descent 
step. 

In the next section, we will enter into details, define more precisely the factor 
base elements and the associated smoothness notion, and estimate the size of 
the objects involved in the computation. 

3 Detailed Description and Analysis 

3.1 Polynomial Selection 

In the overview of the previous section, nothing is said about the respective 
degrees of / and g. In fact, there is some freedom here, and we could in principle 
have balanced degrees and use for instance the algorithm of [20] or we can use a 
linear polynomial g, both methods leading to the same asymptotic complexity. 
The only difference comes in the individual logarithm stage. In order to keep 
the exposition short, we will only present this stage in the case where g is linear, 
but in practice one must take the one which minimizes the overall time. 

To fix ideas, we take a linear polynomial g and a polynomial / with a degree 
of the form 


deg f = d = 5 (log Q/ log log Q) 1 ^ 3 , 

where the constant S is to be fixed later, so that / and g have a common root 
modulo p. They can be obtained by a simple base-m algorithm applied to p, 
yielding coefficients for / and g of size 

n/iioo«Nioo«y /(d+i) , 

where the infinite norm of a polynomial with integer coefficients denotes the 
infinite norm of the vector formed with the coefficients of a polynomial. 

In practice, instead of a na'ive base-m approach, one can use any of the 
methods known for the polynomial selection of NFS, when tackling prime fields 
or integer factorization [3,4,13,23,24]. 

What is left is to select a polynomial h of degree n with small coefficients 
which is irreducible modulo p. This is done by testing polynomials with small 
coefficients and, heuristically, we succeed after n trials, on average, because the 
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proportion of irreducible polynomials modulo p is « 1/n. As we will explain 
later, rather than having the polynomial h with the smallest coefficients, we 
might prefer some polynomial with slightly larger coefficients but with the addi- 
tional property that the Galois group of h is cyclic of order n. For this, we test 
polynomials in families with a cyclic Galois group; for example Foster [17] gives 
a list of such families when deg h = 2, 3, 4, 5 or 6. 

If one is interested in rigorous results and not in the most efficient polyno- 
mials, then one can give a proof of existence based on Corollary 10 given in the 
Appendix. Indeed, using cyclotomic fields one provably finds h with coefficients 
upper bounded by ( An B log (pn) c ) n for some effective constants A , B and C. 


3.2 Relation Collection 

In the top of the diagram of Fig. 1 one usually takes a — bx with a, b G R. 
However, in the most general version of NFS one considers polynomials in R[x) 
of arbitrary degrees; this is in particular necessary for the medium characteristic 
case [21]. In our study, we did not find any case where it was advantageous to 
consider polynomials of degree more than 1. Therefore we stick to the traditional 
(a, 6)-pairs terminology for designating a linear polynomial a(t) — b{i)x in R[x\ 
that we consider as a candidate for producing a relation. 

Ideals of Degree 1 . In our case, just like in the classical NFS, only ideals of 
degree 1 can occur in the factorizations of the elements in the number rings 
(except maybe for a finite number of ideals dividing the discriminants). This is, 
of course only true when thinking in the relative extensions; we formalize this in 
the following proposition that holds for /, but is also true for g if it happens to 
be non-linear. 

Proposition 1 . Let Q(t) be a number field and let O l be its ring of integers. 
Let f be a monic irreducible polynomial in O b [x\, and denote by a one of its 
roots. We denote by Kf = Q(i,a) the corresponding extension field, and Of its 
ring of integers. 

If q is a prime ideal of O b not dividing the index-ideal [Of : Oja]], then the 
following statements hold. 

(i) The prime ideals of Of above q are all the ideals of the form 

Q = (q,T(a)), 

where T(x) are the lifts to O l [x\ of the irreducible factors of f in O t /q[x\. 
Moreover deg0 = degT. 

(ii) Ifa{t), b(t) G Z [t\ are such that q divides Nx / /Q( t )(a(i) — b(t)a) and a(i)0 L + 
b{i)O b = O b , then the unique ideal of Of above q which divides a(i) — b(i)a 
is Q = (q, a — r(i)) with r = a(i)/b(i)( mod q). 

Proof, (i) This is Proposition 2.3.9 of [14]. 

(ii) Let 0= (q, T(a)} be a prime ideal of K above q that divides a(i)—b(t)a. If 
Q divides b(i) then it also divides a(i ), and therefore we have a contradiction with 
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the condition a(t)0 L + b(i)0 L = O b . Therefore we can simplify valQ(a(y) —b(i)a) 
by dividing out by b(t): 

val0(a(^) — b(i)a) = val# (&(*,)) + valQ (a (y )/&(/,) — a) = vaL^a — 

This expression is non-zero only when 0 = (q, a — r(y)), which proves the state- 
ment. 


Note that the coprimality condition is similar to the one we have in the classical 
case, and the proportion of coprime pairs is 


n 

q prime ideal in Qp) 



1 

Cqw(2)’ 


replacing 1 /Cq( 2) in the classical variant. 

Factor Base. The consequence of this result is that we keep only the degree 1 
ideals in the factor bases for each side. With the same notations as above, and 
for a smoothness bound B, we define the factor base for / by 

, . _ J prime ideals of Of, coprime to Disc (Kf), of norm less than F>, 

^ ' \ whose inertia degree over Q(y) is one 

We define J~ g (B) similarly; if g is linear this is just the set of prime ideals of 

O b — O g of norm less than B. Prime ideals that divide the ideal-index [Of : 

C/[aj] are not covered by Proposition 1, and can still occur in the factorization 
of (a(i) — b(i)a). Moreover, since the index-ideal cannot be computed effectively, 
we consider together all the ideals above Disc (/) and above the leading coefficient 
of /. We denote them by Vf on the /-side, and V g on the g-side. The cardinalities 
of these sets are bounded by a polynomial in logQ. Since Proposition 1 cannot 
be used for detecting which elements of Vf divide (a(i) — b(t)a ), we have to use 
general algorithms, and again, we refer to [14]. 

Finally, we join the two factor bases and these exceptional ideals in the global 
factor base defined by 


T = J-f(B) U J- g (B) U T>f U T) g . 


We note that, as usual, the parameter B will be chosen of the form B = 
Lq( 1/3, /?), for a constant /? to be fixed later. 

By the prime ideal theorem, the number of prime ideals in Q(y) of norm 
less than B is (1 + o(l)). Using Chebotarev’s density theorem, the average 
number of roots of / (resp. g) modulo a random prime ideal q is one. Hence the 
cardinality of the factor base is 


#^ = 


B 

log B 


(2 + o(l)), 


which is similar to its value in the classical variant of NFS. As usual, in the 
complexity analysis, we approximate by the quantity Lq( 1/3,/3), since 
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polynomial-time factors are, in the end, hidden in the o(l) added to the exponent 
constant. 

Finding Doubly-smooth (a, 6)-pairs. Among various choices for the shape of 
the aft) and bft) polynomials that we tried, the one giving the smallest norms is 
that where a and b are of maximal degree, n — 1, and for which their coefficients 
are all of more or less the same size. 

Let us denote by A a bound on these coefficients of aft) and bft). In the end, 
it will be chosen to be just large enough so that we get enough relations to get 
a full-rank system by browsing through all the possible coprime (a, 6)-pairs of 
degree at most n — 1 fitting this bound. 

In order to estimate the probability that an (a, b )- pair gives a relation, the 
first step is to bound the size of the absolute norms on the /- and the g-side. 
The main tool is the following bound on the resultant. 

Theorem 2 [10, Thru 7]. If f,g G C[c] have degree df and d g , then 


|Res(/, 5 )| < \\f\\ d Mt{d s + l) d ^\d g + l) d r\ 


We can now give the formula for the bound on the norm. We write it with the 
notations of the /-side, but it applies also to the g-side, after replacing the degree 
d by 1. 

Theorem 3. Let h and f be monic irreducible polynomials over Z of respective 
degrees n and d. Let K be the compositum of the number fields defined by h and 
f, and let l and af be roots in K of h and f, respectively. 

Let aft) and bft) be two polynomials of degree less than n and with coefficients 
bounded by A. Then, the absolute norm of the element a(u) — b(t)af of K is 
bounded by 


| N K/q (a(0 - b{i)a f ) I < A^WfWUhW^Cin, d), 


( 1 ) 


where Cfn , d) = fn + l) (3d+1)n/2 (d + l) 3n / 2 . 

Proof We have N K/Q = Nq W /qoN k /q W and, since / is monic, we get 



where F(a,b) = X^e[od] fia(t) l b{t ) d ~ l . The i-th term of this sum is a product 
of fi and of d factors that are polynomials of degree less than n. Each term of 
the sum is therefore a polynomial of degree less than or equal to d{n — 1) with 
coefficients bounded by \\f\\ 00 A d n d . Therefore, we have 


||E(a, &)||cx> < (d + 1)\\ f \\oo A d n d . 


Finally, since h is monic, we have Nq q)/Q(F(a, &)) = Res (h, F (a, b )) , and we 
can apply Theorem 2 to get the following upper bound: 


Nq W/ q (F(a,b)) < ||F(a,6)||^||/ i ||^- 1) (n+ l) d ^'\d{n - 1) + l) n ' 2 
< \\h\\ d ^A nd \\f\\^(d+l)i n (n + l) L ^ 
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If the polynomials /, g or h are not monic, the theorem does not apply, since 
the element a{i) — b{i)af is not an integer anymore. However, the denominators, 
that are powers of the primes dividing the leading coefficients are under control 
in term of smoothness (it suffices to add a few prime ideals in the factor bases). 
And in fact, the quantity based on resultants computed in the proof of the 
theorem is the one that is really used for smoothness testing. Therefore, the 
monic hypothesis is not a restriction, and is just there to avoid technicalities. 

It remains to plug-in ||h|| = 0(1) and the bounds for ||/||oo and ||g||oo coming 
from our choice of polynomial selection and we get: 

N Kf/Q (a - ba f ) < (^ nd ||/||^) 1+o(1) = (-E d Q 1/(d+1) ) 1+o(1) , (2) 

and 

N Kg/q (a - ba g ) < (^|l5ll^) 1+o(1) = (£Q 1/(d+1) ) 1+o(1) , (3) 

where we have set E = A n , so that the quantity of pairs that are tested is E 2 , 
just like in the classical NFS analysis. It is to be noted that the contribution of 
C(n,d) remains negligible. Indeed, it would reach a value of the form Lq( 2/3), 
only when n gets larger than an expression of the form (log Qj log log Q) 1 / 3 , 
which is not the case, since we ask that p is larger than any expression of the 
form Lq (2/3). It is worth noticing that the expressions for the norms are the 
same as for the prime field case, where Q = p. 


3.3 Writing and Solving Linear Equations 

Mapping a factorization of ideals to a linear combination of logarithms is not 
immediate unless the ring is principal and there are no units other than =bl; both 
things are highly unlikely since the fields Kf and K g have large degrees over Q. 
Therefore, we have to resort to the notion of virtual logarithms, just like in the 
classical case. 

For this, it is easier to work with absolute extensions. Then, we can use the 
same strategy as in Sect. 4.3 of [21], that we summarize in the following theorem 
which can be applied to Kf and K g . 

Theorem 4 ([21, Section j. 3]). Let K = Q(0) be a number field and Vp a non- 
ramified ideal of its ring of integers Ok, with residual field isomorphic to F p n in 
which we fix a generator t. Let £ be a prime factor of p n — 1 and let U = {x E 
K | V£ above £, valc(x) = 0}. 

We assume that there exists a Schirokauer function, i. e. an injective group 
homomorphism A = (Ai,...,A r ) : ( U/U e , •) — > (Z/ZZ, +) r , where r is the unit 
rank of Ok • 

Assuming furthermore that £ neither divides the class number of K nor its 
discriminant, the following holds. 

There exists a map log : {ideals of Ok coprime to ^3} — ► Z/ZZ and a map 
X : {1, • • • , r} — > 7L /£ Z called virtual logarithms, so that, for all (j) G Z[x\, such 
that 0(0) is in U and coprime to ^3 ; we have 
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log t H0) V = X! va 1 a(0(6»)) 1 og Q + ^ A j (0(6»))x i , (4) 

0 prime ideal j = 1 

V 

where fi(0) is the projection of (f)(0) in the residual field ¥ p n of Vp. 

In [33], Schirokauer explained how to construct an explicitly and efficiently 
computable map A as in the theorem and brought heuristics to support the 
assumptions. These heuristics and the fact that the other hypothesis of the 
theorem are expected to be true rely on the condition that £ is not too small. 
These are the main reasons why we asked that i grows at least like Lq (1/3) in 
the beginning. 

For each (a, 6)-pair that gives two smooth ideals in Kf and K g , the element 
a(t) — b(t)af can be expressed in the absolute representation of Kf = Q (Of) by 
a polynomial form 0/(0/), and similarly a(i) — b(i)a g can be written fi g (O g ) in 
K g = Q(0 g ). We refer for instance to [14] for algorithms to manipulate relative 
extensions as absolute extensions. Then, applying Theorem 4 to 0/ in Kf and 
fi g in K g , we obtain two linear expressions that must be equal, since they both 
correspond to the logarithm of the same element in F p n . 

As a consequence, each relation is rewritten as a linear equation between the 
virtual logarithms of the elements of the factor base and the Xj f° r each field. 
We make the now classical heuristic that collecting roughly the same number 
of relations as the size of the factor base (say, a polynomial factor times more), 
then the linear system obtained in such a manner has a kernel of dimension 
one. A vector of this kernel is computed using Wiedemann’s algorithm [36] in a 
quasi- quadratic time B 2+ °^\ This gives the logarithms of all the ideals in the 
factor base. 


3.4 Overall Complexity of the Main Phase 


From the previous sections, we can now conclude about the complexity of the 
main steps of the algorithm. In fact, with our choice for the polynomial selection, 
and the kind of (a, 6)-pairs that we test for smoothness, we have obtained exactly 
the same expressions for the sizes of the norms as in the usual NFS complexity 
analysis for prime fields, and in particular the same probability Prob that the 
product of the norms is smooth. Also, since the linear algebra step is also similar, 
the final complexity is the same: we have then to minimize B 2 + E 2 subject to 
the condition E 2 • Prob > B 1+o( ^-\ and we refer for example to Conjecture 11.2 

of [13]. Hence, the optimal values of the parameters are E — B — Lq( 1/3, \J\) 
and d = v / 3( log^gg ) 1 ^ 3 , and the heuristic complexity of the main phase of 


TNFS is Z/q (1/3, 



3.5 Individual Logarithms 

Let s be an element of F* n for which we want to compute the discrete logarithm. 
If s is very small, then it factors into ideals of the factor base, and its logarithm 
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is easily retrieved. However, in general, this requires a 2-phase process that is 
not so trivial, although negligible compared to the other steps. 

First, in what we call a smoothing phase, the element s is randomized and 
tested for Hi-smoothness with the ECM algorithm. The bound B\ will be of the 
form Lq( 2/3), so that the cost of the smoothing test is in Lq (1/3). 

Thereafter, each prime ideal £2 which is not in the factor base is considered 
as a special-q and we search for a relation involving £2 and other smaller ideals. 
Continuing recursively, we get a special-q descent tree, from which the logarithm 
of s can be deduced. 

Smoothing. The randomization is simple: we compute z = s e in ¥ p n for random 
values e, and test 2 for smoothness. The logarithm of s is just the logarithm of 
z divided by e modulo L 

To be more precise, the smoothness is not tested for the element z as an 
element of the finite field, but as the corresponding element in K g . Indeed, in 
our construction, 2 G IF p n is represented by a polynomial of degree less than 
n with coefficients modulo p. Lifting these coefficients to integers, we obtain a 
polynomial which makes sense modulo h(t), therefore an element of Q(y) = K g 
(this is where we use that g is linear). As usual, to test the smoothness of 2 as an 
element of Q(^), we test the smoothness of its norm as an integer. Using again 
the estimate of Theorem 3, the size S of this norm is Q 1+ °^\ 

The bound B\ can then be optimized w.r.t. this only step, like in the classical 
NFS: if this is too small, the probability of being smooth is too small, while if it is 
too large, the cost of testing the smoothness by ECM is prohibitive. The analysis 
is the same as in [15] and gives a value B\ = Lq (2/3, (^) 1 ^ 3 ); the corresponding 
cost for the smoothing phase is Lg(l/3, 3 1 / 3 ). 

After the smoothing phase, the logarithm of s has been rewritten in terms 
of the logarithms of small prime ideals of K g for which the logarithm is already 
known, and some largish prime ideals of K g , of norm bounded by B\. The next 
step is to compute the logarithms of these largish ideals. 

Descent by Special-q. As in NFS, the algorithm is recursive: if £1 is a prime 
ideal of degree one in Kf (respectively K g ), then we write log £2 as a formal 
sum of virtual logs of ideals £2' of Kf and K g with norm less than N(£2) c , for 
a positive parameter c < 1. For this, we consider the lattice of (a, 5)-pairs for 
which £2 divides the element a — botf (resp. a — ba g ). A basis for this lattice 
can be constructed and LLL-reduced. Small combinations of these basis vectors 
are then formed and the norms of the corresponding (a, b) pairs are tested for 
N(£2) c -smoothness. We refer to Appendix 7.1 for the description of this special-q 
lattice technique, that is also used in practice during the collection of relations 
in the main stage. When a relation is found, this gives a new node in the descent 
tree, the children of it being the ideals of the relations that are still too large to 
be in the factor base. The total number of nodes is quasi-polynomial. 

The cost of each step is determined by the size of N (a(u) — afb(t)) (resp. 
N (a(t) — a g b(i))) which are tested during the computations. The matrix Mq of 
the basis of the lattice has determinant det Mq = N(£2), so a short vector in the 
LLL-reduced basis has coordinates of size « N^) 1 ^ 2 ^. We make the heuristic 
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assumption that all the vectors of the reduced basis, (a^ k \b^) for k = 1, . . . , 2 n, 
have coordinates of the same size. The pairs (a, b ) tested for smoothness are 
linear combinations (a, b) = Y^kLi^k{ a ^ k \^ k ^) where are rational integers 
with absolute value less than a parameter A', we set E' = ( A ') n . By Theorem 3, 
the size of the norms tested for smoothness is 

N Kf/Q (a-ba f ) < (max(||a|| co ,||6|| <x) ) nd ||/||^) 1+o(1) = (N(Q) d/2 (E') d Q 1/d ) 1+ ° W , 

N Kg/q (a - ba g ) < (maxOMU ||6|| 00 )”|| 5 ||^) 1+0 « = (N(Q) 1/2 E'Q 1/d ) 1+ ° W . 

These expressions coincide with the ones in the analogous stage of the classical 
variant (for example in Equation (7.11) in [5]) and we obtain a complexity of 
Lq( 1/3, 1.1338...) which is the same as in the classical case [15]. We conclude that 
the overall complexity of individual logarithm is dominated by the Lq( 1/3, 3 1 / 3 ) 
complexity of the smoothing test. 

4 Variants 

Note on the Boundary Case. TNFS can be applied to the boundary case 
p = Lq(2/3,c p ), c p > 0, where one obtains a complexity Lq(1/3,c). The con- 
stant c is strictly larger then 64/9 as the factor C(n,d) in Eq. (1) is not 
negligible any more. Yet, for some values of c p , TNFS overcomes the method 
of [21], which was state-of-art until recently. Using the generalized Joux-Lercier 
method, the authors of [6,7] reduced the constant c to (64/9) 1 / 3 ~ 1.92 and 
Pierrot [31] showed that a multiple fields variant allows to further reduce c to 
« 1.90. Therefore, we do not reproduce here the tedious computations of the 
complexity in the boundary case. 

The Case of Primes of Special Form (SNFS). Given a positive integer 
d, an integer p, not necessarily prime, is said to be a d-SNFS integer if it can 
be written as p = P(u ) for some integer u ~ p 1 ^ and a polynomial P E 7L\x\ 
such that ||P||oo is small (say, bounded by a constant). We remark that when 
a number is SNFS, then there can be several valid choices for d and P. This is 
typically the case for integers of the form 2 k + 6, for tiny e. 

When solving DLP in fields F p n for d-SNFS primes p, we can follow the 
classical SNFS construction [27] and set f(x) = P(x) and g(x) = x — u, which 
is possible since / and g share the root u modulo p. 

When evaluating the sizes of the norms, Eq. (2) can be restated with ||/||oo = 
0(1), so we obtain the following bound: 

N Kf/Q (a - ba f ) N Kg/Q (a - ba g ) < . (5) 

Following the analysis of Semaev [35], we obtain that if the degree d can be 
chosen to grow precisely as d = ^/| then the overall complexity of 

SNFS is the same as that of factoring numbers from the Cunningham project, 
namely Lq( 1/3, ^f). 
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Using Multiple Number Fields (MNFS). Given a choice of polynomials / 
and g selected as in the first step of TNFS, one can construct a large number 
of polynomials fa which share with / and g the root m modulo p. The idea 
goes back to Coppersmith’s variant of NFS for factorization [16] and has been 
used again in [8,28] and [31]. Let V be a parameter of size Lq(1/3,c v ) for some 
constant c v > 0. For all fi{t) and v(t) G Z[t\ so that deg /x, deg v < n — 1 and 
IImIIoo, Halloo < V 1/( - 2n \ we set 

fw = MW/ + v{i)g, (6) 

keeping only those polynomials that are irreducible (most of them are, so we 
expect that the correcting factor on the bounds for ||/i||oo and ||^||oo are only 
marginally adjusted). Let us denote by Kf the number field generated by f^ v 
over Q(x), and call a a root of f g ^ v in its number field. For any pair (/x, v) as 
above and (a, b ) in the sieving domain, by Theorem 3 we have 

N *„,„(<» -<*„,„&) < ^ 4 nd (y 1/(2n) ||/||oo) n ||/i||S?C'(n, d) = (V 1/2 E d Q 1/d ) 1+o(1) . (7) 

In the multiple number field sieve a relation is given by a pair (a, b ) in the 
sieving domain and a polynomial from the set constructed above so that 

Nx g /Q(a — ba g ) is 5-smooth and ^ (a — ba^^) is 5/U-smooth. We use as 

factor base the set 

H, V 

We collect relations as in Coppersmith’s modification: collect pairs (a, b) in the 
sieving domain and keep only those for which N Kg /Q (a — a g b) is 5-smooth. Then, 
for each surviving pair (a, b) we use ECM to collect polynomials such that 
/Q(a — a g , v b) is 5/U-smooth. 

We choose parameter E so that the number of collected pairs exceeds 25, 
which is an upper bound on #5. The same considerations as in [16] allow 
us to find the optimal parameters: V = Lq{ 1/3,1 — (-^f— ^) 1 ^ 3 ), 5 = 5 = 
L q (1/3, ( ^o 3 / 15 ) 1 / 3 ) and d = 5(logQ/ loglogQ) 1 / 3 where (5 = ( 32=2^3 )1/3. 
the complexity of the multiple field variant of TNFS is Lq(1/2>, (92+2|v51)i/3)_ 

Automorphisms. Joux, Lercier, Smart and Vercauteren [21] proposed an 
improvement based on the field automorphisms of the number fields occurring 
in NFS. A recent preprint proves (a reformulation of) the following result: 

Theorem 5 (Theorem 3.5(i) of [6]). Let a be a field automorphism of K/Q. 
Assume that ^ is a prime ideal of K above p such that cdp = ^3. Fix a prime i 
dividing N(^3) — 1, coprime to the class number and the discriminant of K. Fix 
a generator t of the residual field of and, for any prime ideal £3, denote by 
log 0 the virtual logarithm of £3 with respect to t and a set of explicit generators 
so that Jo-(Q) = cr(7n)- Then, there exists a constant k G [0,ord(cr) — 1] such 
that for any 0 we have 


log(cr£3) = p K log(£3) mod t. 
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In Sect. 3.1 we noted that one might find i so that Q(t)/Q has n automor- 
phisms over Q. All these automorphisms can be used to speed-up computations, 
using the following result. 

Corollary 6. Let a be an automorphism of Q(t)/Q and call d the unique field 
automorphism of Kf such that d(i) = a(t) and d(otf) = af. Assume that f has 
small coefficients so that virtual logarithms are defined using explicit generators. 
Then, there exists n G [0,ord(cr) — 1] such that, for all prime ideals £3 of Kf, we 
have 

log(d£3) = p K log £3 mod t. 

Proof The only non-trivial condition, dtyf = ^P/, is tested directly: 

= a{pZ[i\,a f - m) = (a(p)Z[i],a(a f ) -a (m)) = (pZ[i\,a f - m) = / . 

According to [7], automorphisms allow us to sieve n times faster and to speed- 
up the linear algebra stage by a factor n 2 . Note that, contrary to the classical 
variant of NFS where automorphisms were available only for certain values of n , 
TNFS has no restrictions. 

5 Comparison for Cryptographically Relevant Sizes 

The complexity of NFS and its many variants is written in the form C 1+o( ' 1 \ 
which can hide large factors, and therefore we cannot decide which variant to 
implement based only on asymptotic complexity. We follow the methodology 
of [7, Section 4.4] and do a more precise comparison by evaluating the upper 
bound on the size of the integers which are tested for smoothness: the product 
of the norms with respect to the two sides. In particular, we make explicit the 
negligible terms of Eqs. (2) and (3) using Theorem 3. 


5.1 The Case of General Primes 

When p is not an SNFS number, we compare TNFS to the algorithm of Joux, 
Lercier, Smart and Vercauteren(JLSV) [21]. From Eqs. (2) and (3) we find a 
formula for the logarithm of the product of the norms in TNFS: 

2 

Ctnfs = (d+ 1) log 2 E + ^ i log 2 Q = Cnfs, 

where d = deg / can be chosen as desired (unlike the SNFS variant of the 
algorithm where d might be imposed by the shape of p). It is remarkable that 
this formula is the same as for NFS in the integer factorization case. 

Since the JLSV algorithm comes with a variety of methods of polynomial 
selection, we cannot give a unified formula for the size of norms’ product, so we 
use the minimum of the formulae in [7]. Therefore, in the following, when we say 
JLSV, this covers both variants explained in [21] as well as the Conjugation and 
Generalized Joux-Lercier methods. The choice of the parameter E depends on 
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Fig. 2. Comparison of TNFS (in black) and the best variant of JLSV algorithm 
(in dashdotted blue). Vertical axis: bitlength of the norm’s product; horizontal axis: 
bitlength of p n (Color figure online). 


the size of the norms, but for a first comparison we can use the default values of 
CADO-NFS [7, Table 2]. 

In Fig. 2 we compare TNFS to JLSV when p is a general prime (not SNFS), 
for a range 400 < log 2 Q < 1000. We conclude that in this range, when n > 5, 
TNFS is competitive and must be kept for an even more accurate comparison. 

5.2 The Case of Primes of Special Shape (SNFS) 

The Importance of the d Parameter. If we want to compute discrete log- 
arithms in a field ¥ p n such that p is d-SNFS for a parameter d, then the first 
question to ask is whether to use a general algorithm like TNFS and JLSV or 
a specialized variant of these two, namely the SNFS variant of TNFS that we 
denote STNFS or the Joux-Pierrot algorithm. 

When d = 6 we can rely on a real-life example: Aoki et al. [2] factored a 
1039-bit integer with SNFS, using sextic polynomials, i.e. d = 6. The current 
record, hold by Kleinjung et al. [26], was obtained with a MNFS variant and 
targeted d-SNFS integers for d = 8. Their computations were much faster than 
the evaluated time to factor a 1024-bit RSA modulus, so it is safe to say that 
SNFS is the best option when log 2 Q ~ 1024 and d = 6 or when d = 8 for slightly 
larger targets. However, the value of d is fixed in most cases and can take very 
different values among curves used in pairing-based crypto- systems, going from 
d = 2 for MNT curves [29] to d = 56 in other constructions [18, Table 6.1], [30]. 


The Tower Number Field Sieve 


45 


If the polynomial P such that p = P(u) has a special shape, one can try to 
reduce the value of d using techniques from the Cunningham project records. 
On the one hand, if P = T(x a ) with T E Z[x\ and a E N, we can also write 
p = T{v) with r = w a , so p is (degT)-SNFS. This technique can be used for 
example in the construction of Brezing-Weng [12, Section 3, item 3(b)] where 
P(x) = pa 2 + ub 2 for some small constants p and v and where a, b E 7L\x 5 ] 
have degree 5 and respectively 15; we replace P of degree 30 by a polynomial of 
degree 6. 

On the other hand, a construction of Freeman, Scott and Teske [18, Con- 
struction 6.4] allows to divide the degree by 2. Indeed, in that case the poly- 
nomial P is almost a palindrome, in the sense that it can be written as 
P(x) = |x( degP )/ 2 T(x — |) with T E Z[x\. Then we select / = T(pc) and 
g = ux — {u 2 — 1), which share the root u — - modulo p and are so that 

ll/lloo = 0(1) and llsiloo =_p 1/des/ . 

Modeling. A good comparison requires a precise estimation of the norms. How- 
ever, several factors in Eq. (1) can be negligible in some cases but can also be 
very large in the others: 

negligible factors = C(n, d)||/||^||/i||^,. 

The factor C(n,d) is itself a bad estimation of the number of terms in the 
Sylvester discriminant, which can vary between 6 bits for n = 2 and d = 3 to 15 
bits for n = 5 and d = 6. This determines us to restrict to n < 5 and d < 6. The 
factor ||/ ||£o equals 1 if ||/||oo = 1 but can be as large as 2 62 when n — 12 and 
ll/lloo = 36. Hence, it is impossible to plot the size of the norms for all SNFS 
numbers, independently of the polynomial /. 

For our modeling, we consider the case ||/||oo = Halloo = 1 and neglect 
the combinatorial factor C(n, d) for small values of n and d. From Eq. (5) the 
dominant factor in the product of the norms for STNFS is 

C S TNFs(nV) = log(£ d+1 ) + log(Q 1/d ). 

Note again that this formula is the same as that of the complexity of the factoring 
variant of SNFS. 

The product of the norms in the Joux-Pierrot algorithm is bounded by 
(n + l) 2t (logn) nd pj2ri(d+i)/t Q(t-i)/(nd) (discussion preceding Eq. (5) in [22]), 
and for the comparison we keep only the logarithm of most important factors: 

Cjp(n,d,t) = ^ log (E d+1 ) + — log(Q 1/d ). 
t n 

Let us see two examples in which we tackle fields of about one kilobit, for 
which we use the approximation log 2 E = 30.4, as in [2]. 

A First Example. We target a 1024-bit field F p 2 for a 6-SNFS prime p and we 
set the parameters equal to their value in the computation of Aoki et al. If one 
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chooses to forget that p has a special shape and uses JLSV with conjugation 
method, then the product of the norms has bitsize « 439. If instead one uses the 
special shape of p, the product of the norms for STNFS has bitsize Cstnfs(^ = 
2, d — 6) « 386, while the best parameters for the Joux-Pierrot algorithm yield 
Cjp(n = 2,d = 6, t = 3) « 457. A probabilistic experiment suggests that our 
model is quite precise, as the negligible factors do not add more than 6 bits. 

Barreto -Naehrig. The elliptic curves proposed by Barreto and Naehrig [9] cor- 
respond to finite fields of parameters n = 12 and d = 4. We tackle a field of 
1024-bit cardinality and we will use a value of E close to the one in the fac- 
torization record, i.e. log 2 E = 30.4. If we forget that p is SNFS, then we can 
choose the value of d in TNFS and we find Ctnfs(^ = 12, d = 7) = 500. If 
instead we use the special shape of p we obtain Cstnfs(^ = 12, d = 4) = 408 
and Cj P (n = 12 , d = 4, t = 12) = 539. 

In that case, the extension degree n (a.k.a. the embedding degree in the 
pairing context) is already pretty large, so that we are not at all in the nominal 
range of applicability of TNFS. As a consequence, our estimate for Ctnfs is way 
too optimistic, since the so-called negligible factors are no longer small. But in 
fact, it is not that bad: computing explicitly the norms for a sample of typical 
(a, by s of the appropriate size shows that the product of the norms for STNFS 
is 60 to 80 bits larger than the ideal model when / = 36x 4 + 12x 3 + 16x 2 + 2x + 1 
and h = x 12 — x — 1. Therefore, it might still be better than Joux-Pierrot. 

There are however examples when the specialized algorithms do not apply. 

Fact 7. When d = 2, the JP and STNFS algorithms are not better than the 
general ones , i.e. 


Cjlsv < min(Cjp, Csnfs), 

where Cjlsv is the complexity of the JLSV algorithm with conjugation method. 

To see this, note first that the Joux-Pierrot algorithm keeps unchanged the 
stages of JLSV once finished the polynomial selection. In the Joux-Pierrot algo- 
rithm one constructs polynomials / and g such that deg (/) = nd, deg(p) = n, 
ll/lloo = 0(1) and H^Hoo = Q 1/(nd) . However, when n = 2, they have the same 
characteristics as the polynomials constructed by the Conjugation method, which 
applies to arbitrary primes. 

Also note that the STNFS uses a polynomial g with coefficients of size p x ' d . 
When d = 2 the norm of the p-side has bitsize larger than \ log 2 Q, which 
is typical for algorithms of complexity Lq (1/2) and is larger than the norms 
considered in the JLSV algorithm in the range log 2 Q < 1000 and n < 5. 


Plots. Let us plot the modelled bitsize of the norms product for STNFS and 
Joux-Pierrot in the range which is currently feasible or might become in the near 
future: 400 < log 2 Q < 1000. Together with Cstnfs an d Cjp (Joux-Pierrot), we 
also plot Cnfs which represents the bitsize of the product of the norms in NFS 
when factoring RSA numbers. We make separate graphs for each pair (n, d) 
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Fig. 3. Comparison of Cnfs (in dashed blue), Cstnfs (in black) and Cjp (in dasdotted 
red) in ¥ p n with n — 2, for d-SNFS primes. Vertical axis: bitlength of the norm’s 
product; horizontal axis: bitlength of p n (Color figure online). 



Fig. 4. Comparison of Cnfs (in dashed blue), Cstnfs (in black) and Cjp (in dashdot- 
ted red) in F p n with n — 3, for d-SNFS primes. Vertical axis: bitlength of the norm’s 
product; horizontal axis: bitlength of p n (Color figure online). 
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Fig. 5. Comparison of CWs (in dashed blue), Cstnfs (in black) and Cjp (in dashdot- 
ted red) in F p n with n — 4, for d-SNFS primes. Vertical axis: bitlength of the norm’s 
product; horizontal axis: bitlength of p n (Color figure online). 



Fig. 6. Comparison of Cnfs (in dashed blue), Cstnfs (in black) and Cjp (in dashdot- 
ted red) in F p n with n — 5, for d-SNFS primes. Vertical axis: bitlength of the norm’s 
product; horizontal axis: bitlength of p n (Color figure online). 
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where n is the degree of the target field and d is the parameter such that p is 
d-SNFS, as those parameters are unique (in general) for each finite field: Fig. 3 
(n = 2), Fig. 4 (n = 3), Fig. 5 (n = 4) and Fig. 6 (n = 5). Albeit the value of 
E depends on the size of the norms, in a first approximation we can use the 
formula E = c • Lq( 1/3, (4/9) 1 / 3 ) where c is a constant chosen such that the 
formula fits the value of E in the example of Aoki et al. 

We emphasize that our comparisons are imprecise since they are based only 
on the product of the norms. Nevertheless, one might make two remarks: 

- when d > 3, the two algorithms specialized in fields of SNFS characteristic 
have smaller norms than those of NFS when factoring RSA numbers; 

- when d > 4, STNFS is an important challenger for the Joux-Pierrot algorithm. 


6 Cryptographic Consequences 

The number field sieve algorithm is still far from being fully understood, in par- 
ticular for extension fields that are so important for pairing-based cryptography. 
In the past few years, several improvements have been made in the asymptotic 
complexities in various scenarios, leading in particular to an £(1/3, 32 / 9) com- 
plexity for small degree extensions of SNFS-prime fields, that are common in 
pairing-friendly constructions. 

We have shown, that in this setting, an old NFS variant due to Schi- 
rokauer could compete and probably overcome the algorithm by Joux-Pierrot. 
We acknowledge that the comparison is not perfect since it is based on a model 
where the efficiency is directly linked to the size of product of the norms of the 
elements that have to be tested for smoothness. Still, in some cases, the differ- 
ence is large enough (a few dozens of bits), so that we are confident that this 
should translate into a significant practical difference. 

Of course, only a careful implementation of both algorithms could confirm 
this. Unfortunately, this goes way beyond the scope of this paper. As far as 
we know, Joux- Pier rot’s algorithm has not been used so far for a record-setting 
computation, and Schirokauer’s TNFS would require even more implementa- 
tion work to handle the sieve in higher dimension. And since doing experiments 
with non-optimized implementations and small field sizes could lead to highly 
misleading conclusions, we preferred to keep this for future work. 

7 Appendix: Technicalities 

7.1 Special-q Sieving 

In practice for prime fields the relation collection phase is split in subtasks fol- 
lowing the so-called special-q sieving strategy. It is expected, but no so obvious, 
that this technique can be adapted to the case of TNFS. 
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The General Case. Given a prime ideal 0 of Kf (resp. of K g ), the special-q 
algorithm collects (most of) the coprime pairs (a, b) G Z[t] 2 which satisfy 

- a — botf = 0 mod 0; 

- N K/ / Q (a - boif)j N K/ /q(£ 2) and N^/ Q (a - ba g ) are 5-smooth, 

and which have coefficients bounded by N Kf /q(£l) 1 ' 2n I for a parameter /. 

In the main lines, the sieving is done by Algorithm 1, where a key role is 
played by the lattice Lq of (a, 6)-pairs such that 0 | a — botf. 

n — 1 n—1 

La j(a 0 , . . . ,a n _i,6 0 , . . . ,6 n - i) G Z 2n | ( ^ a k k ) - a/( ^ b k i k ) = 0 mod o|. 

fc=o fc=0 


Algorithm 1 . Special-q task 

1: Compute an LLL-reduced basis of Tq, . . . , u^ 2n \ and for each k define the 
pair (o< k >, ftW) by «< fc > = E"=o and &<*> = £ «i‘ : V 

2: Initialize an array indexed by (ii,...,i 2 n) G rifc=i[ — w hh the value of 
log 2 N K/ /Q(a - botf) where 


2 n 2 n 

a = i k a^ and 5 = i k b ^ k \ 

k= 1 k= 1 

3: For each £ in 5/ update the entries of the array such that a — botf G £. 

4: Collect yield(f), the coprime pairs (a, b) associated to entries of the array with 
value less than a given threshold. 

5: Repeat Steps 2-4 with / replaced by g , and collect yield (g). 

6: return yield(f) f] yield(g) 


In more detail, if jQ = (q, af — Pq(l)) and q = (g, (*,)), we can assume that 

cpq is monic and define the matrix 


( q 

0 ... 

• ■•ox 

q 



| vector (<p q )| 



| vector (</? q ) 

0 ... 

... o 

vector(p0 ( l )) 
vector(p£j ( l)l ) 

1 


\ vector(p£j ( l ) i n ~ 1 ) 


1/ 


One can check that the rows of Mq form a basis of and that det (5 q) = 
gdegO q ) _ NQ(q/Q(q) = N Kf /Q(Q) and dimL^ = 2 n. Then, the coefficients 
of the shortest vector in an LLL-reduced basis have size about Nr^/q^) 1 ^ 27 ^- 
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We make the heuristic assumption that for a large proportion of ideals 0, all the 
vectors in the reduced basis have coefficients of this size. Then, the coefficients 
of the (a, b) pairs visited during Steps 3-4-5 of Algorithm 1 are approximatively 
equal to /N^/qOQ) 1 /^). 

The critical part of Algorithm 1 is Step 4., where we need to solve a problem 
that Pollard [32] asked in the case mn — 2. 

Problem 1. Compute the intersection of a sub-lattice of Z m with an interval 
product UZoh- 

Since the dimension is fixed or small enough, we can use a generic lattice enu- 
meration algorithm like the Kannan-Fincke-Pohst algorithm. In the case m = 2, 
Franke and Kleinjung [25, Appendix A] gave an elegant algorithm that proved 
very efficient in practice. Extending this algorithm to higher dimension is still 
an open problem. 


The Particular Case of Gaussian Integers. When h = x 2 + 1, t = i and 

we have a series of advantages. First of all, we have deg (h) = n = 2, so the 
combinatorial overhead C(n,d) in Theorem 3 is small. Secondly, the ring Z [i\ is 
Euclidean, so that we can speed-up Step 1 of Algorithm 1. 

Lemma 8. Let q and r be two elements of Z[i\ such that q is irreducible and 
r is not divisible by q. Assume that 0 = (g, ctf — r) is a prime ideal of Kf. 
Let (uj,Vj,dj)j > o be the sequence of Bezout coefficients such that Ujq + Vjr = 
dj, obtained during the Extended Euclidean Algorithm (EE A). Let j > 0 be an 
integer. For k = 1, 2, 3, 4 we set 

= ( dj,Vj ), = ( idj.ivj ), 

(a (3) ,& (3) ) = (d j+ (a< 4 >,&( 4 >) = (id m ,iv j+1 ), 

and define = (Re(a^ fc ^), Im(a^), Re(b^), Im(b^)). Then the vectors u^\ 
u^ 2 \ u^ 3 \ form a basis of the lattice L&. 

Proof Note first that if two elements ei,e 2 form a basis for a Z[i]-module M, 
then the set {ei,iei, e 2 ,ie 2 } is a basis of M seen as a Z- module. We apply this 
fact to M m {(a, b) E Z [i\ x Z [i\ \ a — br = 0 mod g}, so it is sufficient to show 
that ( dj,Vj ) and (dj+i, Uj + i) form a basis of M when seen as a Z[i]-module. 

By construction of the sequence (uj,Vj,dj)j>o, there exist invertible matrices 
Ji, J 2 , . . . E GL(Z[i], 2) so that, for all j > 1, 

f Uj+1 Vj+1 d j+1 \ = I f Uj Vj dj \ 

V u i v i d i ) 3 V u i-1 d i - V ' 

Therefore, for all j, the pairs (dj,Vj) and (dj+i, Vj+i) span the same Z[i]-module. 
In particular, for j = 0, we have (do,vo) = (g, 0) and = (r, 1), which 

is a basis of M, so that any pair in the sequence spans M. Finally, a pair 
(a, b) E Z [i\ xZ [i\ is in M if and only if the vector u = (Re(a), Im(a), Re(6), Im(b)) 
is in the lattice which completes the proof. 
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We interrupt the execution of EEA at its middle point, i.e. for the least index 
j where Nq (i)/q(dj) < y / N q(p/q(^). As in the classical variant of NFS, we make 
the heuristic that for all k G [1,4], we have ||(a^\ ~ vT^I- H ence > we 

replaced Step 1 in Algorithm 1 by EEA in Z [i\. 

Another advantage of Z [i] is that we can easily deal with the roots of unity. 
Indeed, the roots of unity have a bad effect on the sieve since, for any pairs (a, b ) 
found during the sieve, one will also find ( ua , ub ) for all roots of unity u. For a 
practical implementation one might prefer to choose h so that there are no roots 
of unity other than ±1. 

In the case h = x 2 + 1 , we can impose that we have no duplicates due to roots 
of unity. For this, we modify Step 2 of Algorithm 1 so that the indices run in 

(ii,i 2 ,h,U) € [0,/] x [0,/] x [-1,1] x [-1,1] 

instead of [— I, 7] 4 . By doing so we divide by four the number of pairs (a, b) 
sieved in the special-q task associated to 0. Indeed, if a pair (a, b) is written as 
(a, b) = Ylt=i ik{ a ^ k \b^), then when we multiply (a, b) by roots of unity we 
use the following indices where exactly one of the pairs has i\,i 2 > 0: 

(a, b) <-> (ii, i 2 , i3,h) {~a, ~b) <-> {-h, -i 2 , -is, ~U) 

(' ia,ib ) <-► (-i 2 ,h,-h,h) (-ia,-ib) <-► 

7.2 Using a Cyclotomic Field for Q(t) 

Although we found no practical advantage for cyclotomic fields other than Q(i), 
they allow us to give a poof of existence for the polynomial h, as required in the 
TNFS construction of Sect. 3.1. 

Theorem 9 {[1], Prop. 3). Assuming the Extended Riemann Hypothesis 
(ERH), there is a constant c > 0, such that for all p,n E N , p prime and 
gcd (n,p) = l, there exists a prime q such that q = 1 (mod n), q < cn 4 log(pn) 2 
and p is inert in the unique subfield K of Q(( q ) with [K : Q] = n. 

Corollary 10. Under ERH, there exists a constant c > 0 such that, for any 
integer n and any prime p > n, there exists an effectively constructible polyno- 
mial he Z[x] such that: 

- h is irreducible modulo p; 

- J|fc || 00 < (2cn 4 log(np) 2 )". 

Proof. Let c be the constant of the theorem above. Let q be a prime associated 
with p and n and let ( q be a primitive gth root of unity and g a Gaussian period: 

1 = E <«'• 

*eF»/(F») ( ‘j- 1)/ " 
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If ri, . . . , r n are a set of representatives of then the conjugates of 

77 are its images by the morphisms (Ji : ( q 1 — > . Hence, the minimal polynomial 
of 77 over Q is 

n—% 

k = n (—.(,))• 

i = 0 

For k G [0,77,], a crude estimate of the fcth coefficient of / is (2)\v\ k i which 
is further upper bounded by 2 n (q — l) n , and finally by (2cn 4 log(np) 2 ) n . The 
coefficients of h add a factor ||/i||So d ^ in Eq. (1). It remains negligible, i.e. 
Lq( 2 / 3 )°( 1 ), when n 2 = o(d) or equivalently when p = Lq{q) for a > 5/6. 


7.3 The Waterloo Improvement 

At the beginning of the individual logarithm stage, the smoothing step can be 
sped up in practice using the continued fraction method, also called “Waterloo 
improvement” 1 . It allows to replace the probability of an integer of size S to 
be smooth by the probability of two numbers of size \/ r S to be simultaneously 
smooth. This does not change the complexity, unless we make the o(l) expression 
explicit, but has a measurable practical impact. A TNFS equivalent for the 
continued-fraction method is to LLL-reduce the lattice generated by the rows of 
the matrix 


(p 

\ 


0 

p 


z 

1 

\ L n ~ 1 Z 

1 / 


where z is a lift of the target element of the finite field, and z, . . ., t n ~ 1 z are 
written by their coordinates as elements of Q(y). Since det M{z) = p n = Q, a 
short vector (tx 0 , . . . , T/ n _i, t;o, • • • , v n -i) has coordinates of size « Q V 2n . The 
quotient u/v where u = X]/c=o Ukik an h v — ^2k= o Vktk 1S an e l emen f °f Q(^) 
that reduces to the same element of F p n as z. Therefore, instead of testing for 
smoothness the norm of z, of size S = Q, we test whether the norms of u and v, 
both of size are smooth. 
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Abstract. Hash functions are often constructed based on permutations 
or blockciphers, and security proofs are typically done in the ideal per- 
mutation or cipher model. However, once these random primitives are 
instantiated, vulnerabilities of these instantiations may nullify the secu- 
rity. At ASIACRYPT 2007, Knudsen and Rijmen introduced known- key 
security of blockciphers, which gave rise to many distinguishing attacks 
on existing blockcipher constructions. In this work, we analyze the impact 
of such attacks on primitive-based hash functions. We present and for- 
malize the weak cipher model, which captures the case a blockcipher has 
a certain weakness but is perfectly random otherwise. A specific instance 
of this model, considering the existence of sets of B queries whose XOR 
equals 0 at bit-positions C, where C is an index set, covers a wide range 
of known- key attacks in literature. We apply this instance to the PGV 
compression functions, as well as to the Grpstl (based on two permuta- 
tions) and Shrimpton-Stam (based on three permutations) compression 
functions, and show that these designs do not seriously succumb to any 
differential known-key attack known to date. 


Keywords: Hash functions • Known- key security • Knudsen- Rijmen • 
PGV • Grpstl • Shrimpton-Stam • Collision resistance • Preimage 
resistance 


1 Introduction 

Cryptographic hash functions are conventionally built on top of compression 
functions, and in turn on one or more blockciphers. Since the first appearance 
of such compression function F (h,m) = DES m {h) by Rabin [49] in the late 70s, 
many blockcipher-based functions appeared in the literature [23,25,29,30,40,43, 
48,58]. These all enjoy security proofs in the ideal model, where the underlying 
ciphers are assum ed to behave ideally. Characteristic to these designs is that the 
key input to the cipher depends on the input to the compression function, and 
that the key scheduling needs to be sufficiently strong. For instance, Biryukov 
et al. [6] derived a related-key attack on AES and claimed that it invalidates the 
security of the Davies-Meyer compression function when the underlying primitive 
is instantiated with AES. A more recent approach to compression function design 
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is to base them on a limited number of permutations [8,41,42,51,57]. These 
permutations could be designed from scratch, or obtained by fixing a small set 
of keys and using a blockcipher for these keys only. Related- or chosen- key attacks 
on blockciphers do not help the adversary here, as the keys are fixed. 

Known-Key Security of Blockciphers. While in the classical security mod- 
els for blockciphers the key is secret and randomly drawn and the adversary’s 
target is to distinguish the instantiation of the cipher from a random permuta- 
tion (also known as (strong) pseudorandom permutation security), this notion 
does not apply if the key is known to the adversary. At ASIACRYPT 2007, 
Knudsen and Rijmen [27] introduced known-key security of blockciphers. Here, 
the key is presumed known, and the adversary succeeds in distinguishing if it 
identifies a structural property of the cipher. Andreeva et al. [1] proposed a way 
to formalize the known-key security of blockciphers based on the underlying 
primitives. The model is derived from the indifferentiability framework [37] and 
hence all composition results carry over. Intuitively: suppose some cryptosystem 
F is proven to achieve a certain level of security in the ideal permutation model, 
and consider F' to be F with the permutations replaced by independent blockci- 
pher instantiations. Then, F' achieves the same level of security as F, up to the 
known- key indifferentiability bound of the underlying blockciphers. 

In [1], several blockcipher constructions are proven to be known-key indiffer- 
entiable, such as the multiple Even-Mansour cipher and 14 rounds of balanced 
Feistel with random functions (using a result of Holenstein et al. [24]). For such 
ciphers, the above approach works well, although for Even-Mansour the com- 
position is trivial (one essentially replaces an ideal permutation by an ideal 
permutation) and for Feistel with 14 rounds security is only guaranteed up to 
2 n / 32 queries, where n is the state size of the cipher. 

Known-Key Attacks on Blockciphers. Knudsen and Rijmen also demon- 
strated that the Feistel network on n bits with 7 rounds (called “FeisteV’ ) is not 
known- key indifferentiable [1,27]: an adversary can generically find 2 n / 2 plain- 
text/ciphertext tuples (m, c) and (m', d) satisfying Ri n / 2 (^ ® c ® m f 0 d) = 0 
(where Ri r (x) outputs the r rightmost bits of x). This result has lead to a 
wave of other known- key attacks on practical constructions, including gener- 
alized/extended variants of Feistel [1,27,47,53,56], reduced versions of AES 
or Rijndael [22,27,38,44,52], reduced variants of the blockciphers underlying 
SHA-2 and SHA-3 finalists BLAKE and Skein [2,7,31,34,60], and many more 
[3, 11,12, 14, 17, 18,28,33,46,47,54,55]. This paper will mostly be concerned with 
differential known- key attacks, including rebound- and boomerang-based attacks 
(the majority of above-mentioned attacks). We highlight two results that are 
among the best-known ones and that exemplify the idea of the other attacks. 
Gilbert and Peyrin [22] used the rebound technique [39] to derive a known-key 
attack on 8 rounds of AES (called “AESs”)- It starts from the middle, and results 
in a differential trail with four active words in the beginning, and four at the end. 
These active words are overlapping at two positions, hence one could consider 
this result as two tuples (m, c) and (m', d) satisfying raScSra'®^ = 0 at 10n/16 
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bit-positions. The adversary has 2 15 < 2 n / 8 degrees of freedom in the attack, and 
for any choice it results in such a tuple with a certain probability. (The bound 
of 2 n / 8 is used for simplicity later on.) The second attack we highlight is by 
Yu et al. [60] , who employ the boomerang technique [59] to attack 36 rounds of 
the blockcipher Threefish-512 (called “Threefishsg”) used in Skein. This attack 
results in four tuples (ra 1 , c 1 ), . . . , (ra 4 , c 4 ) satisfying ra 1 ® • • • ® c 4 = 0. The 
adversary has 2 n degrees of freedom, but any trial succeeds with probabil- 
ity approximately 2 -454 . Therefore, the expected number of solutions is about 
2 n_ 454 < 2 n / 8 . This attack is in fact a known-related- key attack, where a fixed 
difference in the key exists. For simplicity, we condone this, observing that an 
attack with no key difference must logically be harder. 

In any of these cases, the traditional and commonly employed ideal 
cipher /permutation model falls short: results achieved in this model do not 
necessarily hold if the primitives are instantiated with Feistel7, AESs, 
Threefish36, or any other known- key distinguishable cipher. 


1.1 Our Contributions 

In their seminal work, Knudsen and Rijmen state: “In some cases blockciphers 
are used with a key that is known to the adversary, and at least to a certain 
extent, the key is under the adversary’s control. Our attacks are quite relevant to 
this case.” We investigate this fundamental question whether known- key attacks 
invalidate the security of primitive-based hash functions, but we do so in a much 
more general way. At a high level, we present a model that goes beyond the tra- 
ditional ideal cipher model as well as the principle of known- key attacks and that 
allows to generically analyze the impact of various weaknesses of blockciphers 
on various blockcipher- and permutation-based cryptosystems. 

Model. A naive approach to analyzing the impact of known-key attacks would 
be to simply plug a certain blockcipher construction into a hash function and 
to analyze its security, but this would be a devious and complex combinatorial 
task: for a function based on r permutations, plugging Feistely into it would lead 
to 7 r underlying primitive calls. Note that proving security of the Feistel con- 
struction itself is already extraordinarily hard [16,24,32]. Instead, we model the 
blockciphers in such a way that they behave randomly, except that an adversary 
can exploit the particular relation. More formally, we pose a certain predicate 
and we draw blockciphers randomly from the set of all ciphers that comply 
with predicate Throughout, we refer to this model as the “weak cipher model 
(WCM).” It corresponds to the ideal cipher model if & is trivial. 

We present an explicit description of a random weak cipher for the 
case where implies for each key k the existence of A sets of B queries 
{(&, ra 1 , c 1 ), . . . , (fc, m B , c B )} that comply with a certain condition p. These 
ciphers are modeled to have three interfaces: forward queries, inverse queries, 
and predicate queries. Forward and inverse queries are as usual; on a predicate 
query, an adversary is given a set of B queries satisfying p. Multiple technicali- 
ties are involved in this formalization. Most importantly, predicate ^ applies to 
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tuples of queries, rather than single queries only, and some query responses may 
have a reduced entropy. 

Above-mentioned known-key attacks are covered by our model if the condi- 
tion ip states for some C C {1, . . . , n} that 

Bitsc ( m 1 Sc 1 ©*--® m B 0 c B ) = 0 , (1) 

where Bitsc(^) outputs a string consisting of all bits of x whose index is in C. (In 
fact, our model is much more general: above-mentioned attacks aim to generate 
only one relation, while we allow an adversary to see multiple relations.) The 
value A usually depends on n and C is regularly a large subset. We consider B 
being a relatively small number (independent of n). For the above-mentioned 
attack on Feistely, A = 2 n / 2 , B = 2, and C corresponds to the rightmost n/2 
bits. Similarly, the attacks on AESs (for A = 2 n / 8 , B = 2, and C a certain 
set of size 10n/16) and Threefish 36 (for A = 2 n / 8 , B = 4, and C = {1, . . . , n}) 
are covered, and so are almost all known differential (rebound- or boomerang- 
based) known-key attacks. We remark that, on the other hand, the predicate is 
not well-suited for integral-based known-key attacks: upon a predicate query an 
attacker would receive B « 2 n queries. 

The weak cipher model is similar to an approach followed by Bresson 
et al. [15] for the indifferentiability analysis of the SHA-3 candidate Shabal if 
the underlying blockcipher shows some non-random behavior, and by Bouillaguet 
et al. [13] to analyze the indifferentiability security of SIMD when the underly- 
ing compression function is distinguishable from a random function. However, in 
both approaches, the underlying biased primitives were relatively easy to model. 
For instance in [15] (using our terminology), predicate ^ is a relation that holds 
for single queries only, and not for combinations of queries. This considerably 
simplifies the analysis: one can derive a bias (3 to measure the distance between 
primitive responses and fully random responses, and consider oracle responses 
to be drawn from a set of size at least 2 n-/3 , and the original indifferentiability 
analysis carries over with minor modifications. The predicate used in the analy- 
sis in [13], on the other hand, does apply to tuples of queries, but the model can 
simply be described using two sampling algorithms, and an adversary cannot 
hit a weak pair by accident (which is possible in our analysis). Liskov [35] used 
a similar approach to prove indifferentiability security of the zipper hash if the 
underlying compression function is invertible up to a certain degree. However, 
the analysis is significantly simpler, as this primitive can be perfectly mod- 
eled. We finally remark that Katz et al. [26] analyze the impact of related-key 
attacks on blockciphers to hash functions. However, in their model, the differ- 
ences Ak, Ax , Ay are fixed, an ideal cipher is generated for half of the key space, 
and for the other half the cipher is adjusted as Efe(x, y) = E h®Ak{% 0 Ax) 0 A y . 
This primitive can be easily modeled, but is also too generous to the attacker. 

To our knowledge, this is the first attempt to formally analyze the effect 
of a wide class of blockcipher attacks on higher level cryptographic functions. 
Nonetheless, the weak cipher model is in essence still a model: we use an abstrac- 
tion of the cryptanalytic known-key attacks in such a way that the ideal cipher 
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Table 1 . Security results for the PGV, Grpstl, and Shrimpton-Stam compression func- 
tions in the weak cipher model. Ideal cipher/permutation model bounds match the ones 
of B > 3. All results are tight except for the case ( B = 1, \C\ > n/2) for Shrimpton- 
Stam. 




PGV 

Gr0stl 

Shrimpton-Stam 

B 

\C\ 

collision 

preimage 

collision 

preimage 

collision 

preimage 

1 

< n/2 

2 ( n -| C |)/2 

2 n -\ C \ 

2G-IO/4 

2 ( n -| C |)/ 2 

2 ( n -| C '|)/2 

2 n /2 


> n/2 

2 ( n -| C '|)/ 2 
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2 n /2 

2 n 

2 n / 4 

2 n / 2 

2 n /2 

2 n /2 
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2 n ~\ c \ 

2 n 

2( n -| C '|)/2 

2 n / 2 

2 n ~\ c \ 

2 n / 2 

> 3 

arbitrary 

2 n / 2 

2 n 

2 n / 4 

2 n / 2 

2 n / 2 

2 n /2 


model can be relaxed to cope them. A further discussion on the accuracy of the 
model is given in Sect. 7. 


Application to Blockcipher-Based Hash Functions. Preneel, Govaerts, 
and Vandewalle (PGV) [48] classified the 64 most basic ways of constructing 
a 2n-to-n-bit compression function from a blockcipher with n-bit key and n- 
bit state, and claimed security of 12 of them. A formal security analysis of 
these functions in the ICM has been performed by Black et al. [9], and later by 
Duo and Li [19], Stam [58], and Black et al. [10]. In more detail, in the ICM 
these constructions achieve tight collision security up to about 2 n / 2 queries and 
preimage security up to about 2 n queries. Baecher et al. [4] recently showed that 
the 12 secure PGV functions can be divided into two classes, in such a way that 
if a primitive makes one function secure it makes the entire class secure. 

As first application of our model, we consider the PGV compression functions 
in the WCM and derive collision and preimage bounds for general (A, B,C). 
A schematic summary of the results for various B and C is given in Table 1 
(we remark that A is merely a technical parameter that has no influence on 
the results). We also show that the bounds are optimal, by providing matching 
attacks. Some of these attacks are similar to methods used in [27,53,56] to detect 
(near-) collisions in certain PGV modes of operations using known- key attacks. 

Application to Permutation-Based Hash Functions. We also apply the 
WCM to permutation-based compression functions. This is particularly interest- 
ing for two reasons: (i) it allows us to understand the impact of distinguishes on 
permutations that are used in hash functions, and (ii) a blockcipher with a fixed 
and known key is a permutation and can be used as such. In more detail, we con- 
sider the Grqstl compression function [21] and the permutation-based equivalent 
of the Shrimpton-Stam compression function [57] (see also Fig. 4). In the IPM, 
the former is proven to achieve collision security up to 2 n / 4 queries, where n is 
the state size, and preimage security up to 2 n [20]. Rogaway and Steinberger 
[51] showed via an automated analysis that the latter function is collision and 
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preimage resistant up to 2 n / 2 queries (asymptotically). This has been confirmed 
in the generalized work of Mennink and Preneel [41]. 

A summary of our findings for the Grpstl and Shrimpton-Stam compression 
functions in the WCM is given in Table 1. All results are tight, except for the case 
(B = 1, \C\ > n/2) for Shrimpton-Stam, for which we leave proving tightness as 
an open problem. We remark that the analysis for these schemes is much more 
demanding as multiple primitives are involved. 

Impact. An application of our formalization to the PGV functions and various 
permutation-based functions shows that these achieve a comparable level of secu- 
rity in the ideal and weak cipher model for a spectrum of choices for (A, F>, C). 
This result particularly implies that most relevant rebound-based (including 
[12,22,28,38,52,53,56]) and boomerang-based (including [2,7,31,54,60]) known- 
key attacks known to date do not invalidate the security of such functions, or 
only have a little effect. For instance, the above-discussed attack on Feistely sat- 
isfies B = 2 and \C\ = n/2 and it does not affect the security; similarly for 
Threefish 36 for which B = 4. The attack on AESg is covered for B = 2 and 
\C\ = 10n/16, which demonstrates a slight security degradation to 2 6n / 16 for 
the PGV functions, but this may in part be due to our over- generosity to the 
adversary. We remark that, even though we focused on collision and preimage 
resistance, the techniques can be generalized to other security notions, such as 
near-collisions. This may entail differences in the security results. 

We stress that these results do not mean that the analyzed functions are 
secure when the underlying permutations are instantiated with, say, Feistely 
or Threefishse: it only means that existing known- key attacks, or more general 
weaknesses such as relation (1), alone are not sufficient to invalidate the collision 
and preimage security of the construction. Indeed, more sophisticated attacks 
which are not yet covered by our application of the WCM may still invalidate 
the security of certain modes [6] . It remains a challenging open research problem 
to generalize the findings to underlying primitives that have multiple or different 
weaknesses. 


1.2 Outline 

In Sect. 2, we formally present the “weak cipher model,” and in Sect. 3 we show 
how it relates to known-key attacks. We apply the model to the PGV functions 
in Sect. 4, to the Grpstl compression function in Sect. 5, and to Shrimpton-Stam 
in Sect. 6. We conclude this work in Sect. 7. 

2 Weak Cipher Model 

If X is a set, by x X we denote the uniformly random sampling of an element 
from X. By X x, we denote X <— X U {x}. For a bit string x, its bits are 
numbered x = x\ x \ • • • x^x\. If C C {1, ... , |x|}, the function Bitsc(^) outputs a 
string consisting of all bits of x whose index is in C. Abusing notation, Bits^(x) 
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always denotes the remaining bits (technically, C = {1, . . . , |x|}\C). For 0 <r< 
\x\, we consider Ri r (x) that outputs the r rightmost bits of x. In other words, 
Ri r (x) = BitS{ lv-- r }(x). For a function /, by dom (/) and rng(/) we denote its 
domain and range, respectively. 


2.1 Security Model 

For k>0 and n > 1, by BC(ft, n) we denote the set of all blockciphers with ft-bit 
key operating on n bits. If k = 0, BC(n) := BC(0, n) denotes the set of all n-bit 
permutations. If ^ is a predicate, by BC [$](«, n) we denote the subset of ciphers 
of BC (ft, n) that satisfy predicate For i r E BC[$](ft, n), the input-output tuples 
are denoted (k,x,z), where i r(k,x) = 7Tk(x) = z and 7r~ 1 (k,z) = i t^ 1 (z) = x. 
The key k is omitted in case k = 0. 

Let F : {0, 1} S — > {0, l} n be a compressing function instantiated with 
i > 1 primitives from BC [$](/€, n), for some predicate Throughout, we con- 
sider security of F in an idealized model: we consider an adversary A that is 
a probabilistic algorithm with oracle access to a randomly sampled primitive 
7 r = (7Ti, . . . ,7 Ti) BC[^](«;, n)^. A is information-theoretic and its complexity 
is only measured by the number of queries made to its oracles. The adversary 
can make forward and inverse queries to its oracles, and these queries are stored 
in a query history Q. 

A collision-finding adversary A for F aims at finding two distinct inputs to F 
that compress to the same range value. In more detail, we say that A succeeds 
if it finds two distinct inputs X, X' such that F(X) = F(X') and Q contains all 
queries required for these evaluations of F. We define by 

Advp ol (_4,) = Pr (tt 4 BC [#](«, n)*, X,X' <- A" : X ^ X' A F(X) = F(X')) 

the probability that A succeeds in this. By Advp ol (^) we define the maximum 
collision advantage taken over all adversaries making q queries. 

For preimage resistance, we focus on everywhere preimage resistance [50], 
which captures preimage security for every point of {0, l} n . Let Z E {0, l} n 
be any range value. Then, we say that A succeeds in finding a preimage if it 
obtains an input X such that F(X) = Z and Q contains all queries required for 
this evaluation of F. We define by 

Advp pre (A) = ^ max^ Pr (tt A BC [$\(K,n) 1 , X <- A n {Z) : F(X) = z) 

the probability that A succeeds, maximized over all possible choices for Z. By 
Advp Pre (g) we define the maximum (everywhere) preimage advantage taken over 
all adversaries making q queries. 

If & is a trivial relation, we have BC [$](/«, n) = BC («, n), and the above 
definitions boil down to security in the ideal cipher model (ICM) if k > 0 or 
the ideal permutation model (IPM) if k = 0. On the other hand, if ^ is a non- 
trivial predicate, it strictly reduces the set BC (k, n). In this case, we will refer 
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to the model as the “weak cipher model (WCM),” for both n > 0 and n = 0. 
Very informally, this model still involves random ciphers/permutations, with 
the difference that an adversary may exploit a certain additional property. The 
modeling of a randomly drawn weak ciphers is much more delicate. 


2.2 Random Weak Cipher 

For a certain class of predicates, we discuss how to model a randomly drawn 
weak cipher i r from BC[^](«,n). Let A, 5 G N. We will consider predicates 
that imply, for every k G {0,1}*, the existence of A sets of 5 distinct queries 
{(x 1 , z 1 ), . . . , (x B , z B )} that satisfy <pk ({(a? 1 , z± )i • • • , (x B , z B )}) for some condi- 
tion ip depending on key k. The predicate is denoted <P(A,B,(p). A is merely a 
technical parameter, and throughout we assume it is larger than g, the number of 
oracle calls an adversary can make. This definition of <£(A, 5, <p) is fairly general. 
Particularly, predicate 5-sets may overlap and the condition cp can represent any 
function on the inputs. We note that <P can be easily generalized to tuples of 
different length and/or to multiple types of conditions at the same time. 

Traditionally, an adversary has only forward TTk(x) and inverse i q/ 1 ^) query 
access. In order for the adversary to be able to exploit the weakness present 
in 7r, we give it additional access to i r via a “predicate query” tt f(y): on input 
of ye {1, . . . , A}, the adversary obtains a 5-set {(x 1 , z 1 ), . . . , ( x B , z B )} that 
satisfies pk ({(s? 1 , z 1 ), • • • , (x B , z B )Y ) . 

A formal description of how to model tt BC[^(A, 5, g?)](«, n) is given in 
Fig. 1. Here, for every k G {0, 1} K , Pk is an initially empty list of 7T&- -evaluations, 
where a regular forward/inverse query adds one element (x,z) to Pk and a irf- 
query may add up to 5 elements. Additionally, P f is an initially empty list of 
queries to 7rf. We denote by Bk(Pk, Pjf) Q ({0, l} n x {0, l} n ) B the set of all 
tuples {(x 1 , z 1 ), . . . , (x B , z B )} such that 

(i) x 1 , . . . , x B are pairwise distinct and z 1 , . . . , z B are pairwise distinct; 

(ii) \/ B =1 : x £ G dom(5/ c ) => z £ = Pk{x l ) and z £ G rng(5/ c ) => x l = Pjf 1 (z e )] 
(hi) Pk({(x 1 ,z 1 ),...,(x B ,z B )}) holds; 

(iv) {(x p ( 1 \z p W),...,(x p t B \z p ^)} £ rng(5f) for any permutation p on 

{' B}. 

For a new query tt the response is then randomly drawn from 5^(5^, P®). 
Conditions (i-iii) are fairly self-evident; note particularly that an existing (x,z) G 
Pk may appear in multiple predicate queries. Condition (iv) assures that the 
drawing from Bk(Pk, P^) is not just an old predicate query or a reordering 
thereof. The usage of this set 5^(5^, P®) allows for a uniform behavior of 7rf for 
every &, and in general of tt BC[^(A, 5, g?)](/c, n), modulo the known existence 
of condition cp. This step is fundamental to our model and new compared with 
previous approaches of [13,15,35]. We remark that the model allows adversaries 
to make their queries at their own discretion, e.g., duplicate queries and regular 
queries after predicate queries are allowed. 
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procedure 7Tk(x) 

if Pk(x) = A: 

z {0, l} n \rng(Pfc) 
Pk (x, z) 
end if 

return Pk(x) 

procedure i 

if P fc - 1 (^) = ±: 

x Z {0, l} n \dom(P fc ) 
Pk (x, z) 

end if 

return P 1 ^ 1 (z) 


procedure 7 rj ( y ) 

if P?(V) = -L: 

{(x\z 1 ),...,(x B ,z B )}^U k (P k ,P^) 
for 1= 1,...,P: 
if (**,**) 0P fc : 

(x l ,z l ) 

end if 
end for 

end if 

return Pf (y) 


Fig. 1 . Random weak cipher 7r. An adversary has access to 7r,7r 1 , and tv® . 


2.3 Random Abortable Weak Cipher 

Security analyses in the WCM are significantly more complex than in the ICM 
or IPM, which is in part because predicate queries may consist of older queries. 
This will particularly be an issue once collisions among queries are investigated. 
To suit the analysis for this case, we transform the WCM to an abortable weak 
cipher model (AWCM), which we denote as BC[^(A, P, <p)](/c, n). At a high- 
level, an abortable weak cipher responds to predicate queries with new query 
tuples only, and aborts once it turns out that an older query appears in a newer 
predicate query. 

For any k G {0,1}^ and partial Pk and P®, define by ^(P^f) C 
({0, l} n x {0, l} n ) B the set of all tuples {(x 1 , z 1 ), . . . , (x B , z B )} such that 

(iii) (p k ({(x 1 ,z 1 ),...,(x B ,z B )}) holds; 

(iv) {(x p( ' 1 \ z p( ' 1 ' 1 (x p ( B \ z p ( B ^)} rng(P^) for any permutation p on 

{!,■■■, B}. 

XJk(Pk) differs from P(P/ C , P®) in that conditions (i) and (ii) are omitted, and 
particularly: it is independent of P&. A formal description of a random cipher 
7 f BC[<P(A, P, n ) is given in Fig. 2. It deviates from Fig. 1 as follows: for 

every key k, i ff responds randomly from Zfc(Pj?), and it aborts if the response 
violates one of the two skipped conditions of Uk(Pk, Pf )• 

The next lemma shows that the WCM and AWCM are indistinguishable 
as long as the abortable weak cipher does not abort, approximately up to the 
birthday bound. Here, we assume that ^(P^f) is always large enough. 

Lemma 1. Let i f BC[<P(A, P, p c )](n, n). Consider an adversary that makes 

q queries to 7 f. Then, 

_ N B 2 q(q + 1) 

Pr (7f sefs abort) < — • 

2 
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procedure 7Vk(x) 

if P k {x) = _L: 

z {0, l} n \rng(Pfc) 
P k (x, z) 
end if 

return Pk(x) 

procedure (z) 

ifP fc - 1 (z) = ±: 

x {0, l} n \dom(P fc ) 
P k (x, z) 
end if 

return P^ 1 (z) 


procedure 7 rj ( y ) 

if Pkiy) = -L: 

{(x\z l ),...,(x B ,z B )}±Z k (PZ) 
for i — 1, . . . , B: 

if x £ G dom(Pfc) A z £ ^ Pk(x £ ): abort 
if z £ G rng(Pfc) A x l ^ P^ 1 (z i ): abort 
if [x l , z e ) G {(x 1 , z 1 ), . . . , (:c £_1 , z £_1 )}: abort 
if 0 Pfc: 

end if 
end for 

P*^(y,{(x\z'),...,(x B ,z B )}) 
end if 

return P* (y) 


Fig. 2. Random abortable weak cipher 7r. An adversary has access to 7r,7r 1 , and 7r^. 


Proof. Consider the i th query, for i G {1, . . . , g}, and assume it is a predicate 
query n^{y). We will consider the probability that this query makes i f abort, 
provided it has not aborted so far. Prior to this i th query, \Pk\ < B(i — 1) and 
\Pk\ < i. Basic combinatorics shows that 

\^k(Pk)\ = \Zk{0)\ ~ B\ • \P%\ , 


where we use that n has not aborted so far. This i th query aborts only if for 
some £ G {1, . . . , P}, the value x l equals an element in dom(P/ c ) U {x 1 , . . . , x £ ~ x } 
or the value z t equals an element in rng(P/ c ) U {z 1 , . . . , z^ 1 }. 

Define by Z| bort (P^) the set of all elements of Ek{Pk) that would lead to 
abort. We have 2 B possible values to cause the abort (namely, x 1 , . . . , z B ), and 
it causes the abort if it equals an element in a set of size at most \Pk | + B. For 
any of these 2P(|P/ C | + B) choices, the number of tuples in Bk{Pk) complying 
with this choice is at most • Thus, 


Pr ( 7 r®(y) sets abort) 


l^f ort (P)l / 2B (\P k \ + B) • ^ 2 B 2 i 

\Z k {P£)\ - \Z k (0)\-B\.\P*\ -2»- T |^ r - 


The proof is completed by summation over i = 1, . . . , q. 


□ 


3 Modeling Known-Key Attacks 

We next apply the WCM to known-key attacks. For the sake of explanation, we 
first reconsider the Knudsen-Rijmen attack on Feistely [27]. (A detailed descrip- 
tion of the attack is also given in the full version of this paper.) Let n G N, 
and let n := 7 be an instance of Feistely with fixed key k. Knudsen and 
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Rijmen revealed four functions : {0. 1}"/ 2 — > {0,1}" such that for 

all y € {0, l}"/ 2 : 


5(y) = 7r(/(y)) and g\y) = n (f(y )) , 

Rin/2 (/(y) © fl(y)) = Rin/2 (/'(?/) © ff'(2/)) • (2) 

These four functions depend on the cryptographic primitive underlying FeisteG 
in a complicated way. Therefore, we can safely assume that these functions 
behave sufficiently random, besides this particular relation (2), and that they are 
unknown to the adversary. are all injective and satisfy f(y ) ^ f'(y ) 

and g(y) ^ g'(y ) for all y. On the other hand, collisions of the form f(y) = f'(y') 
and g(y) = g'(y') may occur. 

Generically, the attack demonstrates that for key k there exist 2 n / 2 possibly 
overlapping sets of distinct queries {(x 1 , z 1 ), (x 2 , z 2 )} that satisfy Ri n / 2 (^ 1 0 ^ 1 ® 
x 2 ® z 2 ) = 0. In other words, Feistel 7 meets predicate <£(2 n / 2 , 2, (/? Feistel where 

^ lsteh {{(x l , Z l ),(x 2 ,z 2 )}) : Ri n / 2 (a; 1 ® z 1 © x 2 ® z 2 ) = 0 . 

Here, we remark that the Knudsen- Rijmen attack works for any fixed but known 
key &, and that condition ^ Feiste b is in fact independent of the key. In this 
work, we will consider a more general predicate <£(A, £?, (p c ) for E N and 
C C {1,..., n}, where 

<Pk ({(a: 1 , z 1 ), . . . , (x B , z B )}) : Bitsc (a; 1 © z 1 ® • • • © x B ® z B ) = 0 . (3) 

This generalized predicate considers the case of arbitrary but fixed and known 
keys, where the adversary can even choose the key every time it makes a pred- 
icate query. Note that also the attacks on AESs and Threefish 36 (see Sect. 1) 
are covered, as they satisfy <P(2 n / 8 , 2, (p c ) for certain C of size 10n/16 and 
^(2 n / 8 , 4, (^{b--’ n }), respectively. In general, all rebound- or boomerang-based 
known-key attack in literature are covered by predicate B,cp c ) for some 
A, B, C. Here, B is always a value independent of n (usually 2 or 4) and C is 
regularly a large subset (of size at least n/ 4). Throughout, we consider A to be 
sufficiently large. 


Basic Computations for AWCM 

For the specific condition cp c of (3), we derive a simpler bound on the probabil- 
ity that a primitive i t BC[<P(A, F>, (^ c )](ft, n) aborts, along with some other 
elementary observations for 7 f. To this end, we define the notation “[X],” which 
equals 1 if X holds and 0 otherwise. For conciseness, we introduce the function 
Sb,c[^\ defined as 


6 B ,c[b] = 2 {cl [B = b\ + [B>b}. 


(4) 
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Lemma 2. Let i r BC[<£(A, B, (p c )]{n, n). Consider an adversary that makes 

q < 2 n ~ l / B queries to i f. Then, 


Pr(» se to abort) < B ^ + 11 . (5) 

v ' ~ 2 n - Bq w 

Lei k e {0, 1} K and 1C Z,Z',Z" s {0, l} n . Consider any new query Ttf(y) and 
assume it does not abort. Write the response as {(a; 1 , z 1 ), . . . , [x B , z B )}. Then, 

(i) V a G {1, : Pr (x a = Z), Pr (z a = Z) < ^- q ; 

(ii) V a € {1, • • • , B} : Pr (x a @ z a = Z) < 

(in) V {a, b} C {1 , . . . , B} : Pr (x a © = Z A x b ® z b = Z') < [ §- q ; 

(iv) V{a,b}C{l,...,B}: 

Pr (x a = Z A x b = Z' A x a ® z a © x b © z b = Z") < 

Proof. Recall from the proof of Lemma 1 that 

\E k (lf)\ = \Z! k (0)\-B\\lf\, 

where \P^\ < Q- For the specific predicate analyzed in this lemma, |J7fc(0)| = 
^nY B ~ 1 2 n ~\ c \ . In the remainder, we regularly bound B\ < B • ( 2 n ) 2B ~ 2 for 
B > 1 or B\ < B • (2 n ) 2B “ 4 for 5 > 2. 


Probability of Abortion. The bound of (5) directly follows from Lemma 1, 
the above-mentioned size of J7fc(0), and the bound on B\. 

Part (i). Define by £^\p^) the set of all elements of Bk(P^) that satisfy 
x a = Z. Then, |^ i) (P j f)| < (2 n ) 2B - 2 2 n -l c l, and 


Pr (x a = Z) 


\ppPkA < _}_ 

|A(i£)| — 2™ — Bq 


A similar analysis applies to the case z a = Z. 

Part (ii). Define by B^ l \p^) the set of all elements of E^{P^) that satisfy 
x a ® z a = Z. We make a distinction between B — 1 and B > 1. In case B > 1, 
a similar reasoning as in (i) applies, and we have \B^\p^)\ < (2 n ) 2B_2 2 n_ l c l . 
On the other hand, if B = 1, we have | )| = 0 if Bitsc(^) 7^ 0 and 

|£f } (Pf)| < 2n if Bitsc(^) = 0. In any case, 


l4 ii} (^)l < (2 n ) 2B - 2 2-l c l<5 B) c[l] , 


Pr (x a ®z a = Z) 


l4 i0 (^)l < S B ,c[l] 

|A(if)| ~ 2™ — Bq 


and 
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Part (iii). This part only applies to B > 1; if B = 1 the probability equals 
0 by construction. Define by E^ u \p®) the set of all elements of Ek(P®) that 
satisfy x a ® z a = Z and x b (& z b = Z' . We make a distinction between B = 2 
and B > 2. In case B > 2, a similar reasoning as in (i) and (ii) applies, and 
we have \XJ^ U \P®)\ < (2 n ) 2B_3 2 n_ l c L On the other hand, if B = 2, we have 

|^ m )(pf)| ^OifSitsc^©^) ^0and|4 Ui) (Pf)| < (2 n ) 2 if B\ts c (Z®Z') = 0. 
In any case, 

l4 m) (if)| < (2") 2B - 3 2 "-I c I<5 b ,c[2], 


and 


Pr (x a © 2 a = Z A x b © z b = Z') 


l4 iii} (^)l < S B ,cP\ 

\Zk(P?)\ ~ 2 2n ~Bq • 


Part (iv). The approach is fairly similar to case (iii). If B = 1 the probability 
is 0 by construction. Define by the set of all elements of ZJk(P^) 

that satisfy x a = Z, x b = Z 7 , and x a ® z a 0 x b 0 z b = Z " . In case B > 2, 
we have |Z , ^(P^)| < (2 n ) 2B_4 2 n_ l c l . On the other hand, if B = 2, we have 

l4 iv) ( p f )l = 0 if Bits c (Z") ^ 0 and |rf v) (Pf)| < 2" if Bits C (Z") = 0. In any 
case, 


l4 iv) (if )| < (2 n ) 2B - 4 2 n -^S B , c m , 


and 


Pr (x a = Z Ax b = Z’ Ax a ® z a ® x b ® z b = Z") 


\Zk{P%)\ ~ 2 3n ~ Bq 


□ 


4 Application to PGV Compression Functions 

We consider the 12 blockcipher-based compression functions from Preneel, Gov- 
aerts, and Vandewalle (PGV) [48]. In the ICM these constructions achieve tight 
collision security up to about 2 n / 2 queries and preimage security up to about 2 n 
queries [9,10,19,58]. The 12 constructions are depicted in Fig. 3. Here, we follow 
the ordering of [10], where PGV1, PGV2, and PGV5 are better known as the 
Matyas- Meyer-0 seas [36], Miyaguchi-Preneel, and Davies-Meyer [45] compres- 
sion functions. 

Baecher et al. [4] analyzed the 12 PGV constructions under ideal cipher 
reducibility, which at a high level covers the idea of two constructions being 
equally secure for the same underlying idealized blockcipher. They divide the 
PGV functions into two classes, in such a way that if some blockcipher makes 
one of the constructions secure, it makes all functions in the corresponding class 
secure. Applied to our WCM, the results of Baecher et al. imply the following: 
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Fig. 3. The 12 PGV compression functions. When in iteration mode, the message 
comes in at the top. The groups Gi and G 2 refer to Lemma 3. 


Lemma 3 (Ideal Cipher Reducibility of PGV [4], Informal). Let 1 r 

BC[^](n,n) for some predicate <L>. Let 

G 1 = {1,4, 5, 8, 9, 12} , and G 2 = {2, 3, 6 , 7, 10, 11} . 

For any a G {1,2} and i,j G G a , PGVi and PG Vj achieve the same level of 
collision and preimage security once instantiated with n. 

Baecher et al. also derive a reduction between the two classes, but this reduction 
requires a non-direct transformation on the ideal cipher 7 r 1 , making it unsuitable 
for our purposes. Thanks to Lemma 3, it suffices to only analyze PGV1 and 
PGV2 in the WCM: the bounds carry over to the other 10 PGV constructions. 
In Sect. 4.1 we analyze the collision security of these functions in the WCM. The 
preimage security is considered in Sect. 4.2. 


4.1 Collision Security 


Theorem 1. Let n G N. Let a G {1,2} and consider PGVo. Suppose 1 r 
BC [L>(A,B,cp c )](n,n). Then, for q < 2 n ~ 1 /B, 


Advp 0 Q Va (g) < 


b 2 S b ,c[ Mq 2 


B\ 2S B , c [2\q AB 2 q 2 
2 ) 2 n 2 n 


1 If tv makes the PGV constructions from group G 1 secure, there is a transformation 

r such that T n makes the constructions from G 2 secure, and vice versa. 
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Proof. We focus on PGV2. The analysis for PGV1 is a simplification due to 
the absence of the feed-forward of the key. We consider any adversary that has 
query access to it <— BC [@(A, B , (p c )](n , n) and makes q queries. As a first step, 
we move from tt to 7f BC[^(A, 5, (^ c )](n, n). By Lemma 2, this costs us an 
additional term • 

A collision for PGV2 would imply the existence of two distinct query pairs 
(k,x,z),(k',x f , z f ) such that k ® x 0 z = k' ® x' ® z' . We consider the i th query 
(i G { 1 , . . . , q}) to be the first query to make this condition satisfied, and sum 
over i = 1, . . . , q at the end. For regular (forward or inverse) queries, the analysis 
of [9,10,58] mostly carries over. The analysis of predicate queries is a bit more 
technical. 


Query 7tk(x) or The cases are the same by symmetry, and we consider 

7 lk(x) only. Denote the response by z. There are at most B(i — 1 ) possible 
(fc', x', z'). As z is randomly drawn from a set of size at least 2 n — Bq , it satisfies 
z = k ® x 0 k' ® x' ® z' with probability at most ^-Bq • 


Query Denote the query response by {(fc, x 1 , z 1 ),..., (fc, x B , z B )}. In 

case the 5-set contributes only to (k,x,z), the same reasoning as for regular 
queries applies with the difference that any query of the 5-set may be successful 
and that the bound of Lemma 2 part (ii) applies: B 6 2 fl • 

Now, consider the case the predicate query contributes to both ( k,x,z ) and 
(k,x',z f ). There are ( B ) ways for the predicate query to contribute (or 0 if 
5 = 1). By Lemma 2 part (iii), which considers the success probability for any 
such combination, the predicate query results in a collision with probability at 
most (f 


2 2 ™-Bq 


Conclusion. Taking the maximum of all success probabilities, the i th query 
is successful with probability at most B + (f ) S 2*™-Bq • Summation 

over i = 1 , . . . , q gives 


AdVp°GV2 ( q ) ^ 


B 2 5 B ,cjl}q 2 
2(2 n - Bq) 


(B\ 6 B ,c[ 2 \q , B 2 q{qPl) 

\2J 2 n - Bq^ 2 n -Bq 


where the last part of the bound comes from the transition from WCM to 
AWCM. The proof is completed by using the fact that 2 n — Bq > 2 n_1 for 
Bq < 2 n_1 , and that q + 1 < 2q for q > 1. □ 


We note that the bound gets worse for increasing values of 5. This has a technical 
cause: predicate queries are counted equally expensive as regular queries, but 
result in up to 5 new query tuples. This leads to several factors of 5 in the 
bound. As this work is mainly concerned with differential known- key attacks for 
which 5 is regularly small, these factors are of no major influence. 

The implications of the bound of Theorem 1 become more visible when con- 
sidering particular choices of 5 and C. 
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(i) If B = 1, then Advg& Va (g) < ^ 

(ii) If B = 2, then AdvpQ Va (g) < b 4 9 ; 

(iii) If B > 3 (independent of n), then Advp°Q Va (^) < 5B 2n q + • 

In other words, for B = 2 and (7 with |C| < n/2, or for B > 3 constant and 
C arbitrary, the PGV functions achieve the same 2 n / 2 collision security level 
as in the ICM. On the other hand, if B 1, collisions can be found in about 
2( n- l c D/ 2 queries, and if B = 2 with \C\ > n/2, in about 2 n_ l c l < 2 n / 2 queries. 
See also Table 1 . 

Tightness 

For the cases B = 1 and C arbitrary, and B — 2 and C arbitrary such that 
| C | > n/2, we derive generic attacks that demonstrate tightness of the bound of 
Theorem 1. Knudsen and Rijmen [27] and Sasaki et al. [53,56] already considered 
how to exploit a known-key pair for the underlying blockcipher to find a colli- 
sion for the Matyas-Meyer-Oseas (PGV1) and/or Miyaguchi- Preneel (PGV2) 
compression functions. Their attacks correspond to our B = 2 case. 

Proposition 1 ( B = 1). Let n G N. Let a G {1,2} and consider PGVo. Sup- 
pose 7 r 4- BC [&(A, l,(p c )](n,n). Then, AdvpQ Va (q) > ^\c\- 

Proof. We construct a collision- finding adversary A for PGV2. It fixes key k = 0, 

and makes predicate queries to nf on input of distinct values y to obtain q 

queries (fc, x y , z y ) satisfying Bitsc(% 0 z y ) = 0. Any two such queries collide on 

2 

the entire state, 0 z y ^ with probability at least 2 n-\c\ • The 

attack for PGV1 is the same as we have taken k = 0. □ 

Proposition 2 ( B = 2 and \C\ > n/2). Let n G N. Let a G {1,2} and 

consider PGV a. Suppose i r BC[^(A, 2, (^ c )](n, n). Then , Advp°Q Va (^) > 

q 

2 n-|C| * 

Proof. We construct a collision- finding adversary A for PGV2. It fixes key k = 0, 
and makes predicate queries to irf on input of distinct values y to obtain q 2-sets 
{(k,Xy,Zy), (k,Xy,Zy)} satisfying Bitsc {x y 0 z y ) = Bitsc (x y 0 Zy). These two 
queries collide on the entire state, k(BXy(&Zy = k 0 x 2 0 z 2 , with probability at 
least 2 n-\c\ • If the adversary makes q predicate queries, we directly obtain our 
bound. The attack for PGV1 is the same as we have taken k = 0. □ 


4.2 Preimage Security 

Theorem 2. Let n G N. Let a G {1,2} and consider PGVo. Suppose tt 
BC [<P(A,B,(p c )](n,n). Then , forq< 2 n ~ 2 /B, 

2BV[1]? 


Adv e P p y v j g ) < 


2 n 


2 n 
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The proof is given in Appendix A. It is much more involved than the one of 
Theorem 1, particularly as we cannot make use of abortable ciphers. Entering 
various choices of B and C shows that in the PGV functions remain mostly 
unaffected in the WCM if B > 2, and the same security level as in the ICM is 
achieved [9,10,58]. A slight security degradation appears for B = 1 as preimages 
can be found in about 2 n_ l c L In the full version, we present a matching attack 
in the WCM. 


5 Application to Grpstl Compression Function 

We consider the provable security of the compression function mode of operation 
of Gr0stl [21] (see also Fig. 4): 

FGmstl(£l,£2) = X 2 ©7Ti(xi) © 7 T 2 (xi © X 2 ) . (6) 

The Gr0stl compression function is in fact designed to operate in a wide-pipe 
mode, and in the IPM, the function is proven collision secure up to about 2 n / 4 
queries and preimage secure up to 2 n / 2 queries [20]. We consider the security 
of F Grjzrsti in the WCM, where (7 Ti,7t 2 ) BC[<P(A, B, (p c )](n) 2 . We remark that 
in this section we consider keyless primitives, hence k = 0 and the /c-input is 
dropped throughout. We furthermore note that finding collisions and preimages 
for F Gmsti is equivalent to finding them for 

F Gmstl(^1^2) = XiQx 2 © 7Ti {x \ ) © 7r 2 (x 2 ) , (7) 

as F Gmst i(xi,x 2 ) = F^ mstl (xi,£i©x 2 ), and we will consider F' Gmstl throughout. 

5.1 Collision Security 

Theorem 3. Let n E N. Suppose (7ri,7r 2 ) BC[<P(A, B, (p c )](n) 2 . Then, for 
q < 2 n ~ 1 /B, 

Adv co1 ( ) < /H\ 2^, c [2](g 2 +2"/ 2 -I^G) 4 £?V 

Vp Gr0 S ti W _ 2 n + V2/ 2 n ' 2 • 2 n / 2 2 n 



Fig. 4. Grpstl compression function (left) and Shrimpton-Stam (right). 
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The proof is given in the full version of the paper. If we enter particular choices 
of B and C into the bound, we find results comparable to the case of Sect. 4.1. 
In more detail, for B = 2 and C with \C\ < n/2, or for B > 3 constant and C 
arbitrary, Fcmsti achieves the same 2 n / 4 collision security level as in the ICM 
[20]. If B = 1, the bound guarantees security up to about 2( n_ l c D/ 4 , and if 
B — 2 with \C\ > n/2, collisions can be found in about 2^ n_ l c ^/ 2 queries. See 
also Table 1. In the full version, we also show that the bound is optimal, by 
presenting tight attacks on F' Gr0stl in the WCM. 


5.2 Preimage Security 

Theorem 4. Let n G N. Suppose (7ri,7r 2 ) BC[<P(A, B, p c )](n) 2 . Then , for 
q < 2 n ~ 1 /B, 


Adv epre , , < 2BH b>c [1] (g 2 + Bq 4BV 

V F — 9 n “T ^ /o “T 


2 n / 2 


The proof is given in the full version of the paper. As before, we find that Fcmsti 
remains unaffected in the WCM for most cases, the sole exception being B = 1 
for which preimages can be found in about 2^ n_ l c ^/ 2 . In the full version, we 
also show that the bound is optimal, by presenting a tight attack on FQ r0stl for 
B = 1 in the WCM. 


6 Application to Shrimpton-Stam Compression Function 

In this section, we consider the provable security of the Shrimpton-Stam com- 
pression function [57] (see also Fig. 4): 

Fss(^i,^ 2) = ® 7Ti(xi) ©7T 3 (xi ©7 Ti(xi) © x 2 © 7r 2 (x 2 )) . (8) 

This function is proven asymptotically optimally collision and preimage secure 
up to 2 n / 2 queries in the IPM [41,51,57]. We consider the security of F$s in 
the WCM, where (7Ti, 7 t 2 , 7t 3 ) BC[^(A, B, (p c )](n) 3 . (As in Sect. 5 we consider 

keyless functions, hence n = 0 and the key inputs are dropped throughout.) Our 
findings readily apply to the generalization of Fss of [41]. The analysis of this 
construction is significantly more complex than the ones of Sects. 4 and 5. 


6.1 Collision Security 

Theorem 5. Let n G N. Suppose ( tti , 7t 2 , 7t 3 ) BC[^(A, B, cp c )](n) 3 . Then , 

(i) If B = 1 and C arbitrary, Advp^ (2( n_ l c D/ 2_n£ ) — > 0 for n — > 00 ; 

(ii) If B = 2 and C with \C\ < n/2, Advp°* s (2 n / 2_n£ ) — ► 0 for n — ► 00 ; 

(Hi) If B — 2 and C with \C\ > n/2, Advp^ (2 n_ l c l _ne ) — ► 0 for n — > 00 ; 

(iv) If B > 3 (independent of n) and C arbitrary, Advp^ (2 n / 2-ne ) — >• 0 for 
n — > 00 . 
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Due to the technicality of the proof, the results are expressed in asymptotic 
terms. The proof is given in the full version of the paper. For B = 2 and C with 
\C\ < n/2, or for B > 3 constant and C arbitrary, F$s achieves the same security 
level as in the IPM. On the other hand, if B = 1 , or if B = 2 but \C\ > n/2, 
Theorem 5 results in a worse bound. See also Table 1. In the full version, we also 
show that the bound is optimal, by presenting tight attacks on Fss in the WCM. 


6.2 Preimage Security 

Theorem 6. Let n E N. Suppose (ni, 7t 2 , 7t 3 ) BC [<P(A, B, ip c )](n) 3 . Then, 

(i) If B = 1 and C with \C\ < n/2, Adv^ e (2 n / 2 " ne ) ^ 0 for n ^ oo; 

(ii) If B = 1 and C with \C\ > n/2, Advp^ e (2 n_ l c l _n£ ) — ► 0 for n — > oo; 

(Hi) If B >2 (independent of n) and C arbitrary, Advp^ e (2 n / 2-ne ) — > 0 for 

n — > oo. 

As for collision resistance, the results are expressed in asymptotic terms. The 
proof is given in the full version of the paper. The bounds match the ones in 
the IPM, except for the case of B = 1 and \C\ > n/2. We leave it as an open 
problem to prove tightness of Theorem 6 part (ii). 

7 Conclusions 

Since their formal introduction by Knudsen and Rijmen at ASIACRYPT 2007 
[27], numerous known- key attacks on blockciphers have appeared in literature. 
These attacks are often considered delicate, as it is not always clear to what 
extent they influence the security of cryptographic functions based on these 
known-key blockciphers. We presented the weak cipher model in order to inves- 
tigate this impact. For a specific instance of this model, considering the exis- 
tence of A sets of B queries that satisfy condition (p c of (3), we proved that the 
PGV compression functions [48], the Grqstl compression function [21], and the 
Shrimpton-Stam compression function [57] remain mostly unaffected by the gen- 
eralized weakness. Additionally, preimage security of the functions turned out to 
be significantly less susceptible to these types of weaknesses than collision secu- 
rity. The results can be readily generalized to other primitive-based functions, 
such as the double block length compression functions Tandem-DM, Abreast- 
DM, and Hirose’s compression functions [23,30], and to the permutation-based 
sponge mode [5]. 

Our model is general enough to cover practically all differential known- 
key attacks in literature, such as latest results based on the rebound attack 
[12,22,28,38,52,53,56] and on the boomerang attack [2,7,31,54,60]. To our 
knowledge, our work provides the first attempt to formally analyze the effect 
of a wide class of cryptanalytic attacks from a modular and provable security 
point of view. It is a step in the direction of security beyond the ideal model, con- 
necting practical attacks from cryptanalysis with ideal model provable security. 
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There is still a long way to go: in order to make the connection between the two 
fields, we abstracted known-key attacks to a certain degree. It remains a highly 
challenging open research problem to generalize our findings to multiple or dif- 
ferent weaknesses, and to different permutation-based cryptographic functions. 
These generalizations include the analysis of known-key based constructions for 
more advanced conditions ip (such as arbitrary polynomials). 
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A Proof of Theorem 2 


We focus on PGV2. The analysis for PGV1 is a simplification due to the absence 
of the feed-forward of the key. We consider any adversary that has query access 
to 7T <— BC[^(A, P, ip c )](n , n) and makes q queries. Let Z e { 0, l} n . A preimage 
for Z would imply the existence of a query ( k,x,z ) such that x ® z = k ® Z. 
We consider the i th query (i E {1, . . . 5 q}) to be the first query to make this 
condition satisfied, and sum over i = 1, . . . ,q at the end. For regular (forward 
or inverse) queries, the analysis of [9,10,58] mostly carries over. The analysis 
of predicate queries is a more technical, particularly as we cannot make use of 
abortable ciphers. 

Query 7Vk(x) or i r^" 1 (z). The cases are the same by symmetry, and we consider 
7 Tk(x) only. Denote the response by z. As z is randomly drawn from a set of size 
at least 2 n — Bq , it satisfies z = x ® k ® Z with probability at most 2 ^-Bq • 

Query 7r^ ( y ). Denote the query response by {(fc, x 1 , z 1 ), . . . , (k, x B , z B )}. If all 
tuples are old, the query cannot be successful as no earlier query was successful, 
and so we assume it contains at least one new tuple. The response is drawn 
uniformly at random from the set Bk(Pk, Pjf )• For £ = 0, denote by 

£l(Pk, Pk) the subset of all responses that have £ new query tuples and B — £ 
old query tuples (which already appear in P&). By construction, 

£ k {P k ,P*)~[}si{P k ,P*). (9) 

£=0 


Define furthermore for £ = 1, . . . , B by T^’ pre (Pfc, P®) the subset of elements of 
P^) for which one of the new query tuples satisfies x 0 z = k ® Z (recall 
that we have excluded the case of £ = 0). The predicate query is successful with 
probability 


B 

Pr Pfc (v) sets pre(Qi)) = ^ 


1=1 


l4’ pre (P k ,Pf)| 

\Zk(Pk,P?)\ 


( 10 ) 
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Using (9), we bound (10) as 


Pr ( n k(y ) sets p re (2*)) < 


|i^ pre (p fc ,pf)| * |4 pre (p fc ,pf)| 

\Z?(Pk,P?)\ p 2 \Zl(.Pk,P%)\ 


(ii) 


The reason why £ = 1 is treated differently, will become clear shortly. 

We next bound all relevant sets. Here, for integers a > b > 1, we denote by 
a- = (a-b) ! factorial power. Starting with the numerators, for £ = 1 

we have 


\Zl’ pie (Pk, if) I < s • l p fc |— • (2 n - |P fe |) • 

Indeed, we have B positions for the sole new query to appear and \Pk\^—^- choices 
for the old queries. For the new query, without loss of generality (fc, x B , z B ), it 
needs to satisfy B\tsc(x 3 ® z B ) = Bitsc^ 1 ® • • • ® z B ~ x ) and x B ® z B = k ® Z. 
We have 2 n — \Pk\ possible choices for x B , and any choice gives at most one 
possible z B . We remark that |T’^’ pre (P/ c , P®)\ will probably be about a factor 
less, as we should only count all possible solutions for the B — 1 old queries 
that satisfy Bitsc^ 1 © • • • © z B ~ x ) = Bitsc(& © Z). Deriving a tighter bound 
would be a cumbersome exercise, but fortunately there is no need to do so: the 
fraction of elements in Bk(Pk, T-f ) consisting of B — 1 old tuples is already small 
enough for the case B > 1. This is the reason why we use a special treatment 
for the case of £ = 1 in (11). 

For £ E {2, . . . , B} we have 

|r£’ pre (P fe ,Pf)| < (^) • \P k \^-(2 n - \P k \f ■ e ■ (2 n - |P fc |)^.2"-' c l . 


Again, the first term comes from identifying at which positions the new queries 
appear and the second term comes from the selection of old queries. Next, we 
have (2 n — \Pk\)~ choices for the x- values and £ positions for the “winning query” 
to occur. For this particular winning query, the corresponding z-value is fixed 
by the equation x © z = k © Z. For the remaining £ — 1 ^-values, there are 
(2 n — \P k \)— possibilities to freely fix the first £ — 2 of them, and the last one 
will be adapted to the predicate condition, and can take at most 2 n_ l c l values. 

Regarding the denominators, for £ E {1, . . . , B} we have 


\si(Pk,p£)\>(f)-\Pk\— 


f (2 n — |Pfc|)~ • (2" — |Pfc|)^— V — \ 

\ Bq • (2 n - \P k \)— ■ (2 n - \P k \)— ■ 2— 1°! j ’ 


which can be seen as follows. As before, we have (^) positions for the new 
queries to appear and |P/c|^— 1 - possible lists of old queries. Regarding the £ new 
queries, without loss of generality (fc, x 1 , z 1 ), . . . , (fc, x*, z £ ), these need to satisfy 
BitscO^ 1 © * * * © z l ) = Bitsc(^ +1 © • • • © z B ). We first compute the number of 
choices for these new queries where z l is only used to adapt to this condition 
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and does not need to satisfy that it is fresh. For this case, we have precisely 
(2 n — \P k \f • (2 n — \P k \)— choices for x 1 , . . . ,z £ ~ 1 ,x £ , and 2 n_ l c l possibilities 
for the adaption value z £ . 

Now, we subtract the cases where this adapted value happens to collide, 
either with an older value in rng(P/ c ) or with any of the new z 1 , . . . , z £ ~ 1 . Any of 
these choices would fix z £ (in total at most (\Pk \ +£ — 1) possibilities). Similarly 
to the analysis for \B k pre (P k , P^f)|, where now x £ will be used to be adapted to 
the predicate condition, there are at most 

(\Pk\ +e- 1) • (2 n - \p k \)— • (2" - \Pk\)-- 2 n - |c| 


choices for the fresh values. As i < P, and additionally \P k \ < B(i — 1) < B(q — 1) 
for the current query, we obtain our bound for \U £ k (P k ^ P k )\- The bound can be 
simplified to 

\Ei{Pk,Pj?)\ > (f) -\Pk\—-(2 n -\Pk\)—-(.2 n -\Pk\)—-2 n - ]C} -(2 n ~2Bq), 

using that = 2" - \P k \ ~ (i ~ 1) > 2" - Bq. 

Plugging these bounds into (11), we find for the case B — 1: 


Pr pf (y) sets pre(Qi)) < 


2 ra — |-Pfc| 

2 n ~\ c \ ■ (2 n - 2g) 


2\°\ 

< . 

“ 2 n - 2q 


For the case B > 1 the computation is a bit more elaborate: 


Pr sets P re(Qi)) 


< _B ■ (2 n — | Pfc | ) 

- (2« - |p fe |)®ni. 2 n -l c l • (2" - 2Bg) (2 11 - |P fc |)^d_ 
A (2 "-|P fc |)^-( 2 "-|P fc |)^ l 

s (2™ - IPfcD^i- (2- - |Pfc|)— ' 2 n - 2B g ' 


For the first fraction we use that 2 n — \Pk\ < (2 n — | i 3 /,. | j — — - as B > 1, and 
additionally that \C\ < n. For the falling factorial powers of the second fraction, 
we use that |P/c|^— ^ < ( Bq) B ~ x and (2 n — | )^^T > (2 n — \P k \ — (B — 1)) B ~ 1 > 

(2 n — 2Bq) B ~ 1 . For the fraction in the sum, we use that ( ^ n = 

r fe|j IP fell 

2^ I |pfc | — (t— 2) < !• We obtain: 


Pr pfc(y) sets pre (Qj)) < 
< 


g (gg) 5 - 1 A 1 

2 ra - 25g (2 n - 2Bq) B ~ 1 2” - 2£g 

B B q B ~ 1 B 2 

(2™ - 2Eg) B + 2" - 2Eg ' 
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Conclusion. Taking the maximum of all success probabilities, the i th query 

is successful with probability at most (Jt^Bq) 3 + B 2*-2Bq • Summation over 
i = 1 , . . . , q gives 


Adv PGV2(9) < 


B 


B B q 


(2 n - 2 Bq) B 
The proof is completed by using the fact that 2 


+ 


B 2 S 


B.C 


[1 ]Q 


2™ - 2 Bq 

- 2 Bq > 2 n_1 for Bq < 2 n ~ 2 . 
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Abstract. HMAC and its variant NMAC are the most popular appro- 
aches to deriving a MAC (and more generally, a PRF) from a cryptographic 
hash function. Despite nearly two decades of research, their exact security 
still remains far from understood in many different contexts. Indeed, recent 
works have re-surfaced interest for generic attacks, i.e., attacks that treat 
the compression function of the underlying hash function as a black box. 

Generic security can be proved in a model where the underlying com- 
pression function is modeled as a random function - yet, to date, the 
question of proving tight, non-trivial bounds on the generic security of 
HMAC/NMAC even as a PRF remains a challenging open question. 

In this paper, we ask the question of whether a small modification to 
HMAC and NMAC can allow us to exactly characterize the security of the 
resulting constructions, while only incurring little penalty with respect to 
efficiency. To this end, we present simple variants of NMAC and HMAC, 
for which we prove tight bounds on the generic PRF security, expressed in 
terms of numbers of construction and compression function queries neces- 
sary to break the construction. All of our constructions are obtained via a 
(near) black-box modification of NMAC and HMAC, which can be inter- 
preted as an initial step of key-dependent message pre-processing. 

While our focus is on PRF security, a further attractive feature of 
our new constructions is that they clearly defeat all recent generic attacks 
against properties such as state recovery and universal forgery. These 
exploit properties of the so-called “functional graph” which are not directly 
accessible in our new constructions. 


Keywords: Message authentication codes • HMAC • Generic attacks • 
Provable security 


1 Introduction 

This paper presents new variants of the HMAC/NMAC constructions of message 
authentication codes which enjoy provable security as a pseudorandom function 
(PRF) against generic distinguishing attacks, i.e., attacks which treat the com- 
pression function of the underlying hash function as a black-box. In particular, 
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we prove concrete tight bounds in terms of the number of queries to the construc- 
tion and to the compression function necessary to distinguishing our construction 
from a random function. Our constructions are the first HMAC/NMAC variants 
to enjoy such a tight analysis, and we see this as an important stepping stone 
towards the understanding of the generic security of such constructions. 

Hash-Based MACs. HMAC [3] is the most widely used approach to key a hash 
function H to obtain a PRF or a MAC. It computes the output on message M 
and a key K as 

HMAC(A, M) = H(K ® opad || H(K © ipad || M)), 

where opad ^ ipad are constants. 1 Usually, H is a hash function like SHA-1, 
SHA-256 or MD5, in particular following the Merkle-Damgard paradigm [4,16]. 
That is, it extends a compression function f : {0, 1} C x {0, l} 6 — > {0, 1} C into 
a hash function MDf v by first padding M into 6-bit blocks M[l], . . . , M[£\, and 
then producing the output H(M ) = Si, where 

So <— IV , Si <— f(Si_i || M\i\) for alii = 1, . . . , A (1) 

starting with the c-bit initialization value IV. A cleaner yet slightly less practical 
variant of HMAC is NMAC, which instead outputs 

NMACK in) * 0 Ut (M) = MD f Ko JMD f K JM)), 
where K- m , K out E {0, 1} C are key values. 

Security of HMAC/NMAC. The security of both constructions has been stud- 
ied extensively, both by obtaining security proofs and proposing attacks. On the 
former side, NMAC and HMAC were proven to be secure pseudorandom functions 
(PRFs) in the standard model [3], later also using weaker assumptions [2] and via 
a tight bound in the uniform setting [7]. However, as argued in [7], this standard- 
model bound might be overly pessimistic, covering also very unnatural construc- 
tions of the underlying compression function f (for example the one used in their 
tightness proof). The authors hence argue for the need of an analysis of the PRF 
security of H M AC in the so-called ideal compression function model where the com- 
pression function is modelled as an ideal random function and the adversary is 
allowed to query it. This model was previously used by Dodis et al. [6] to study 
indifferentiability of HMAC, which however only holds for certain key lengths. 

This is also the model implicitly underlying many of the recently proposed 
attacks on hash-based MACs [5,10,15,17,19,20,22]. These attacks are termed 
generic , meaning they can be mounted for any underlying hash function as long 
as it follows the Merkle-Damgard (MD) paradigm. The complexity of such a 
generic attack is then expressed in the number of key-dependent queries to the 
construction (denoted qc) as well as the number of queries to the underlying 
compression function (denoted qf). These two classes of queries are also often 
referred to as online and offline , respectively. 

Some details such as padding and arbitrary key length are addressed in Sect. 2. 
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All iterated MACs are subject to the long-known Preneel and van Oorschot’s 
attack [21] which implies a forgery (and hence also distinguishing) attack against 
HMAC/NMAC making qc = 2 C / 2 construction queries (consisting of constant- 
length messages) and no direct compression function queries (i.e., qf = 0). This 
immediately raises two questions: 

How does the security o/HMAC and NMAC degrade (in terms of tolerable 
qc) by increasing (1) the length £ of the messages and (2) the number 
qf of compression-function evaluations? 

The first question has been partially addressed in [7]. Their result 2 can be inter- 
preted as giving tight bounds on the PRF security of NMAC against an attacker 
making qc key-dependent construction queries (of length at most £ < 2 C / 3 
6-bit blocks) but no queries to the compression function. They show that both 
constructions can only be distinguished from random function with advan- 
tage roughly e(qc,£) ~ £ 1+ °^qc 2 /2 C , improving significantly on the bound 
€ {qc,£) ~ ^ 2 #c 2 /2 c provable using standard folklore techniques. From our per- 
spective, this bound can be read as a smooth trade-off: with increasing maximum 
allowed query length £ it tells us how many queries qc can be tolerated for any 
acceptable upper bound on advantage. 

Still, it is not clear how this trade-off changes when allowing extremely long 
messages (£ > 2 C / 3 ) and/or some queries to the compression function (qf > 0). 
Note that while huge £ can be prevented by standards, in practical settings qf is 
very likely to be much higher than qc , as it represents cheap local (offline) com- 
putation of the attacker. We therefore focus on capturing the trade-off between 
qc and qf for values of qc that do not allow to mount the attack from [21]. How- 
ever, as we argue below, getting such a tight trade-off for NMAC/ HMAC seems 
to be out of reach for now, we hence relax the problem by allowing for slight 
modifications to the vanilla NMAC/ HMAC construction. 

Our Contributions. We ask the following question here, and answer it 
positively: 

Can we devise variants of HMAC/NMAC whose security provably 
degrades gracefully with an increasing number of compression function 
queries qf , possibly retaining security for qf being much larger than 2 C ? 

The main contribution of this paper is the introduction and analysis of a 
variant of NMAC (which we then adapt to the HMAC setting, as described 
below) which uses additional key material to “whiten” message blocks before 
being processed by the compression function. Concretely, our construction - 
termed WNMAC (for “whitened NMAC”) uses an additional extra 6-bit key iC w , 
and given a message M padded as M[l], . . . , M[£], operates as NMAC on input 
padded to blocks M'[i\ = M[i\ ® A/,, i.e., every message block is whitened with 
the same key (see also Fig. 1). 

2 Here we refer to Theorem 2 in [7] that formally considers a related construction 
Nl in the standard model. However, its proof starts by a transition to the ideal- 

model analysis of a construction very closely related to NMAC, while disallowing 
compression- function queries. 
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The rationale behind WNMAC is two-fold. First, from the security viewpoint, 
the justification comes from the rich line of research on generic attacks on hash- 
based MACs. Most recent attacks [10,15,19,20] exploit the so-called “functional 
graph” of the compression function f, i.e., the graph capturing the structure of 
f when repeatedly invoked with its 5-bit input fixed to some constant (say 0 6 ). 
Since our whitening denies the adversary the knowledge of 5-bit inputs on which 
f is invoked during construction queries, intuitively it seems to be the right way 
to foil such attacks. Moreover, a recent work by Sasaki and Wang [22] suggests 
that keying every invocation of f is necessary in order to prevent suboptimal 
security against generic state recovery attacks. WNMAC arguably provides the 
simplest and most natural such keying. Second, from the practical perspective, 
WNMAC can be implemented on top of an existing implementation of NMAC, 
using it as a black-box. 

PRF-Security of WNMAC. Our main result shows that WNMAC is a secure 
PRF; more precisely, no attacker making at most qc construction queries (for 
messages padded into at most £ blocks) and q? primitive queries can distinguish 
WNMAC from a random function, except with distinguishing advantage 


eWNMAc((?Cj(Zf>^) < 


me 2 _ igcgf l<?c 2 

2 2c ' 2 6+c ' 2 C 




Here, d'(£) is the maximum, over all positive integers £' < £, of the number of 
positive divisors of and grows very slowly, i.e., d'(£) ~ ^i/ lnln C also prove 
that this bound is essentially tight. Namely, we give an attack that achieves 
advantage roughly gc#f/2 2c , showing the first term above to be necessary. Addi- 
tionally, we know from [7] that the third term is tight for i < 2 C / 3 . 

Note that in the case of qt = 0, the bound matches exactly the bound from [7]. 
Moreover, observe that under the realistic assumption that £ < min{2 c / 3 , 2 6-c }, 
the bound simplifies to 

ewNMAc(<Zc><7f>^) < + (d'(£) + 2) • 

Ignoring d' {£) for simplicity, we see that we can tolerate up to qc ~ 2 c / 2 /\/^ 
construction queries and up to qt ~ 2 1,5c primitive queries. This corresponds to 
the security threshold ranging from 2 192 f-queries for MD5 up to 2 768 f-queries for 
SHA-512. The first term also clearly characterizes the complete trade-off curve 
between qc < 2 C / 2 /y/i and q? for any reasonable upper bound on the message 
length and acceptable distinguishing advantage. 

Other Security Properties. Additionally, we also analyze the security level 
WNMAC achieves with respect to other security notions frequently considered in 
the attacks literature. By a series of reductions, we show that, roughly speaking, 
Ga/nmac also upper-bounds the adversary’s advantage for distinguishing- H and 
state recovery. We believe that addressing these cryptanalytic notions also using 
the traditional toolbox of provable security is important and see this paper as 
taking the first step on that path. 
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Lifting to HMAC. We then move our attention from NMAC to HMAC and pro- 
pose two analogous modifications to it. The first one, called WHMAC, is obtained 
from HMAC in the same way WNMAC is obtained from NMAC: by whitening 
the padded message blocks with an independent key, The second one, termed 
WHMAC + , additionally processes a fresh key K + instead of the first block of the 
message. Both variants can be implemented given only black-box access to H MAC, 
and we prove that they maintain the same security level as WNMAC as long as the 
parameters 6, c of f satisfy 6 2c (for WHMAC) or b >• c (for WHMAC + ). Note 
that for existing hash functions, the former condition is satisfied for both MD5 and 
SHA-1, while the latter holds also for SHA-256 and SHA-512. 

The Dual Construction. Motivated by the most restrictive term gc#f/2 2c in 
G/vnmac, the final construction we propose in this paper is a “dual” version of 
WNMAC denoted DWNMAC, that differs in the final, outer f-call. Instead of 
f(iC 2 , s || 0 5-c ) for a c-bit key K 2 and a c-bit state s padded with zeroes, the outer 
call in DWNMAC computes f(s, K 2 ) for a longer, 6-bit key. As expected, we prove 
that this tweak removes the need for the gc#f/2 2c term and replaces it by the 
strictly favourable term gc#f/2 6+c , proving that the zero-padding in the outer 
call of WNMAC was actually responsible for the “bottle-neck” term in its security 
bound. 

Our Techniques. In our information-theoretic analysis of WNMAC we employ 
the H-coefficient technique by Patarin [18], partially inheriting the notational 
framework from the recent analysis of keyed sponges by Gazi, Pietrzak, and 
Tessaro [8]. On a high level, the heart of our proof is a careful analysis of the 
probability that two sets intersect in the ideal experiment: (1) the set of adversar- 
ial queries to f, and (2) the set of inputs on which f is invoked when answering 
the adversary’s queries to WNMAC. Obtaining a bound on the probability of 
this event then allows us to exclude it and use the result from [7] that considers 
qt = 0, properly adapted to the WNMAC setting. 

Related Work. As mentioned above, the motivation for our work partially 
stems from the recent line of work on generic attacks against iterated hash-based 
MACs [5,10,15,17,19,20,22]. While our security bound for WNMAC does not 
exclude attacks of the complexity (in terms of numbers of queries and message 
lengths) considered in these papers, the design of WNMAC was partially guided 
by the structure of these attacks and seems to prevent them. We find in particular 
the work [22] to be a good justification for investigating the security of WNMAC 
and related constructions. Iterated MAC that uses keying in every f-invocation 
was already considered by An and Bellare [1], their construction Nl was later sub- 
ject to analysis [7] that we adapt and reuse. One can see WNMAC as a conceptual 
simplification of N I where the key is simply used to whiten the 6-bit input to the 
compression function. Finally, our dual construction considered in Sect. 5 bears 
resemblance to the Sandwich MAC analyzed by Yasuda [23], we believe that our 
methods could be easily adapted to cover this construction as well. 

Perspective and Open Problems. We stress that the reader should not con- 
clude from this work that NMAC and HMAC are necessarily less secure than the 
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constructions proposed in this paper, specifically with respect to PRF security. 
In fact, we are not aware of any attacks showing a separation between the PRF 
security of our constructions and that of the original NMAC/HMAC construc- 
tions, finding one is an interesting open problem. 

While obtaining a non-tight birthday- type bound for NMAC/HMAC is feasi- 
ble (for most key-length values, a bound follow directly from the indifferentia- 
bility analysis of [6]), proving tight bounds in terms of compression function and 
construction queries on the generic PRF security of NMAC/HMAC is a challeng- 
ing open problem, on which little progress has been made. The main challenge is 
to understand how partial information in form of f-queries can help the attacker 
to break security (i.e., distinguish) in settings with qc <C 2 c / 2 /v^, when the 
attack from [7] does not apply. This will require in particular developing a bet- 
ter understanding of the functional graph defined by queries to the function f. 
Some of its properties have been indeed exploited in existing generic attacks, 
but proving security appears to require a much deeper understanding: Most of 
the recent attacks, which are probably still not tight, do not come with rigorous 
proofs but instead rely on conjectures on the structure of these graphs [10]. The 
difficulty of this question for NMAC/HMAC is also well documented by the fact 
that even proving security of the whitened constructions presented in this paper 
required some novel tricks and considerable effort. 

Similarly, it remains equally challenging to prove that for the properties con- 
sidered by the recent HMAC/NMAC attacks (such as distinguishing-H, state 
recovery or various types of forgeries), the security of WNMAC/WHMAC is prov- 
ably superior. Yet, we note that our construction invalidates direct application 
of all existing attacks, and hence we feel confident conjecturing that its security 
is much higher. 

Black-box Instantiations. Throughout the paper we implicitly assume we 
can add a key to each 6-bit input block, even though we aim for a black-box 
instantiation. For many MD-based hash functions, such fine-grained control of 
the input to the compression function is generally not possible via a black-box 
message pre-processing. Concretely, the functions from the SH A- family with 512- 
bit blocks only allow to effectively control (via alterations of the message) the 
first 447 bits of the last block, since the remaining 65 bits are reserved for the 
64-bit length, and an additional 1-bit. Our analysis can be easily modified to take 
this into account. The resulting bound will change very little, and will result in 
the term £qcq^/2 h+c being replaced by the term {t — l + 2 d ) -gc '#f/2 6+c , where d is 
the length of the non-controllable part of the input (for SHA- functions, d = 65). 
Note that since d <C b — c, this will not affect the tightness of the bounds for 
concrete parameters. 

2 Preliminaries 

Basic Notation. We denote [n] := {l,...,n}. Moreover, for a finite set S 
(e.g., S = {0,1}), we let <S n , <S + and <S* be the sets of sequences of elements 
of S of length n, of arbitrary (but non-zero) length, and of arbitrary length, 
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respectively (with 5 denoting the empty sequence, as opposed to e which is a 
small quantity). As a shorthand, let {0, l} 6 * denote ({0, l} 6 ) . We denote by 
S[i\ the i-th element of S G S n for all i G [n\. Similarly, we denote by S[i . . . j], 
for every 1 < i < j < n, the sub-sequence consisting of S[i\, 5[i + 1], . . . , S\j], 
with the convention that S[i...i\ = S[i\. Moreover, we denote by S || S' the 
concatenation of two sequences in <S*, and also, we let S \ T be the usual prefix- 
of relation: S' | T (3S f e S* : S \\ S' = T). 

For an integer n, d(n) = |{i £ N : i \ n}\ is the number of its positive 
divisors and 


d'(n) := max \{d G N : d \ n'}\ w n 1 / lnlnn 

n'efi,...,™} 


is the maximum, over all positive integers m! < n, of the number of positive divi- 
sors of n' . More precisely, we have \/e > 0 3no Vn > no : d(n) < n( 1+£ )/ lnlnn [ 11 ]. 

We also let T(J), TV) be the set of all functions from V to 7 Z] and with a 
slight abuse of notation we sometimes write (resp. .F(*,n)) to denote 

the set of functions mapping m-bit strings to n-bit strings (resp. from { 0 , 1 }* to 

{0, l} n ). We denote by x <— X the act of sampling x uniformly at random from 

Finally, we denote the event that an adversary A, given access to an oracle 0, 
outputs a value y , as A 0 => y. To emphasize the random experiment considered, 
we sometimes denote the probability of an event A in a random experiment E 
by P E [A]. Finally, the min-entropy H oc (X) of a random variable X with range 
X is defined as — log (max^^ Px(x)). 

Pseudorandom Functions. We consider keyed functions F : JC x V — > 1Z taking 
a ft-bit key (i.e., JC = {0,1}*), a message M G Pas input, and returning an 
output from 7 Z. For a keyed function F under a key k G JC we often write 
Ffc(-) instead of F(fc, •). One often considers the security of F as a pseudorandom 
function (or PRF, for short) [9]. This is defined via the following advantage 
measure, involving an adversary A: 


AdVp rf (A) := P \k {0, 1} K : A F * 


/ 4- T{J), TV) : A f 


Informally, we say that F is a PRF if this advantage is “negligible” for all “effi- 
cient” adversaries A. 

PRFs in the Ideal Compression Function Model. For our analysis below, 
we are going to consider keyed constructions C[f] : {0,l} K xP^7 Z which make 

queries to a randomly chosen compression function f ^(0 + 6 , 0 ) which can 
also be evaluated by the adversary (we sometimes write C f instead of C [f ] ) . For 
this case, we use the following notation to express the PRF advantage of A: 
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We call A’s queries to its first oracle construction queries (or C-queries) and its 
queries to the second oracle as primitive queries (or f-queries). 

Note that the notion of PRF-security is identical to the notion of 
distinguishing- R, first defined in [13] and often used in the cryptanalytic lit- 
erature on hash-based MACs. 

Distinguishing-H. A further security notion defined in [13] is the so-called 
distinguishing- H security. Here, the goal of the adversary is to distinguish the 
hash-based MAC construction C k [f] using its underlying compression function f 
(say SHA-1) and a random key K , from the same construction C^[g] built on top 
of an independent random compression function g. In the ideal compression func- 
tion model, where we model already the initial compression function f as ideal, 
this corresponds to distinguishing a pair of oracles (C^[f],f) from (C^[f],g). 
Formally, 


Advf t ' H (A) := P Ik 4- {0,1}“, f 4- F(c + b,c) : A c «’ f 


- P 


K 4- {0, 1}", f, g 4- T[c + b,c ) : A c ' k 


State Recovery. An additional notion considered in the literature is security 
against state recovery. Since the definition of this notion needs to be tailored for 
the concrete construction it is applied to, we postpone the formal definition of 
security against state recovery to Sect. 3.10. 

MACs and Unpredictability. It is well known that a good PRF also yields 
a good message-authentication code (MAC). A concrete security bound for 
unforgeability can be obtained from the PRF bound via a standard argument. 

Iterated MACs. For a keyed function f : {0, 1} C x {0, l} b — > {0, 1} C we denote 
with Casc f : {0, 1} C x {0, l} 6 * — ► {0, 1} C the cascade construction (also known as 
Merkle-Damgard) built from f as 

Casc f (iC, mi || . . . \\rnt) := yi where yo := K and for i > 1 : := f(^_i, ra*), 

in particular Casc f (iC, e) := K. 

The construction NMAC f : ({0, 1} C ) 2 x {0, l} 6 * — > {0, 1} C is derived from Casc f 
by adding an additional, independently keyed application of f at the end. It 
assumes that the domain sizes of f satisfy b > c and the output of the cascade 
is padded with zeroes before the last f-call. Formally, 

NMAC f ((AT 1; K 2 ), M) := f(K 2 , Casc^i^, M)||0 b-C ). 

Note that practical MD-based hash functions take as input arbitrary-length bit- 
strings and then pad them to a multiple of the block length, often including 
the message length in the so-called MD-strengthening. This padding then also 
appears in NMAC (and HMAC) but here we take the customary shortcut and 
our definition of NMAC above (resp. HMAC below) actually corresponds to the 
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generalized constructions denoted as GNMAC (resp. GHMAC) in [2] where this 
step is also justified in detail. 

HMAC f is a practice-oriented version of NMAC f , where the two keys (Ab, K 2 ) 
are derived from a single key A"G{0,l} 6 by xor-ing it with two fixed 6-bit strings 
ipad and opad. In addition, the keys are not given through the key-input of the 
compression function f, but are prepended to the message instead. This allows 
for the usage of existing implementations of hash functions that contain a hard- 
coded initialization vector IV. Formally: 

HMAC f (K,m) := Casc f (IV, iC 2 ||Casc f (IV, Ki||m)||fpad) 

where (Ab, K 2 ) := ( K 0 ipad, K 0 opad) 

and fpad is a fixed ( b — c)-bit padding not affecting the security analysis. (Tech- 
nically, [14] allows for arbitrary length of the key K : a key shorter than b bits 
is padded with zeroes before applying the xor transformations, a longer key is 
first hashed.) 


3 The Whitened NMAC Construction 


We now present our main construction called Whitened NMAC (or WNMAC 
for short). To that end, let us first consider a modification of the cascade con- 
struction Case called whitened cascade and denoted WCasc. For a keyed function 
f : {0, 1} C x {0, l} b — > {0, 1} C we denote with WCasc f : ({0, 1} C x {0, l} 6 ) x 
{0, l} 6 * — > {0, 1} C the whitened cascade construction built from f as 

WCasc f ((iCi, K w ),rm\\ . . . || me) := yi 

where y 0 ■= A'i and for i > 1 : y t := m* © K w ), 

in particular WCasc f ((iCi, K w ), e) := K\. 

The construction WNMAC is derived from NMAC, the only difference being 
that the inner cascade Case is replaced by the whitened cascade WCasc. More 
precisely, 

WNMAC f ((iCi, K 2 , K w ), M) := f(AT 2 , WCasc f ((K 1; K w ), M)||0 b - C ). 


For a graphical depiction of WNMAC, see Fig. 1. We devote most of this section to 
the proof of the following theorem that quantifies the PRF-security of WNMAC. 


Theorem 1 (PRF-Security of WNMAC). Let A be an adversary making 
at most qt queries to the compression function f and at most qc construction 
queries, each of length at most i b-bit blocks. Let K = (Ki,K 2 , K w ) G {0, 1} C x 
{0, 1} C x {0, l} b be a tuple of random keys. Then we have 


Adv 


prf 

WNMAC f K 


(A)< 


me 2 _ 

2 2c 2 b+c 


^gc 2 

2 C 




( 2 ) 


Note that as observed in Sect. 2, this also covers the so-called distinguishing- 
R security of WNMAC. Moreover, our analysis also implies security bounds for 
distinguishing-H and state recovery, as we discuss later. 
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m i m2 rn 3 



77i£ 



WNMAC[f]jc 1 ,K 2 ,K w (mi||...||m£) 


Fig. 1. The construction WNMAC[f]K 1; K 2 ,i<r w . 

3.1 Basic Notation, Message Trees and Repetition Patterns 

Let us fix an adversary A. We assume that A is deterministic, it makes exactly 
qt queries to f and qc construction queries, and it never repeats the same query 
twice. All these assumptions are without loss of generality for an information- 
theoretic indistinguishability analysis, since an arbitrary (possibly randomized) 
adversary making at most this many queries can be transformed into one satis- 
fying the above constraints and achieving advantage which is at least as large. 

Let Qc C ({0, l} 6 ) be any non-empty set of messages (later this will rep- 
resent the set of A’s C-queries). Based on it, we now introduce the message tree 
and its labeled version, which capture the inherent combinatorial structure of 
the messages Qc , as well as the internal values computed while these messages 
are processed by WCasc f inside of WNMAC f . The message tree T(Qc) = (V,E) 
for Qc is defined as follows: 

- The vertex set is V := |m' G ({0, l} 6 )* : 3M G Qc • M' | M j, where is the 
prefix-of partial ordering of strings. In particular, note that the empty string 
5 is a vertex and that Qc Q V. 

- The set E C V x V of (directed) edges is 

E := {(M, M') : 3m G {0, l} 6 : M' = M || m} . 

To simplify our exposition, we also define the following two mappings based on 
T(Qc)- 

- The mapping 7 t(v) : — > V returns the unique parent node of v G W\ 

i.e., the unique node u such that (u,v) G E. 

- The mapping fi{y) : V \ {s} — ► {0, l} b returns the unique message block 
m G {0, l} 6 such that 7 r(v) || ju(v) = v (intuitively, this will be the message 
block that is processed when “arriving” in vertex v). 

Alternatively, with a slight abuse of notation we will also refer to the vertices 
in V as tq, . . . , f|y| which is an arbitrary ordering of them such that for all 
1 < i,j < \V\ it satisfies Vi\vj =4> i < j. Note that one obtains such an ordering 
for example if one, intuitively speaking, processes the messages in Qc block-wise 
and labels the vertices by their “first appearance”: in particular v\ = 5 is the 
tree root. 
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Ki 



f(A(0 || 1), 1 © K w ) 


Fig. 2. Labeled message tree. Example of a labeled message tree Tk(Qc) for four 
messages Qc — {0, 0 || 0, 0 || 1 || 1, 1}, where r — r b for r G {0, 1}. The gray vertices 
correspond to these four messages. Next to each vertex v and edge (u, v), we give the 
label A(T) and the value g(v), respectively. 


Additionally, for a mapping f: {0, 1} C x {0, l} 6 — » {0, 1} C and a key tuple 
K = (Ki : K 2 : K w ) G {0, 1} C x {0, 1} C x {0, l} b we also consider an extended 
version of T(Qc) which we call the labeled message tree and denote T f K (Qc ) = 
(V, E, A), and which is defined as follows: 

- The set of vertices V and edges E are defined exactly as for T(Qc) above. 

- The vert ex- labeling function A: V — > {0, 1} C is defined iteratively: X(e) := K\ 
and for each non-root vertex v G y\{£} we put X(v) := f(A(7r(f )), p(v) ® K w ). 

An example of a labeled message tree is given in Fig. 2. Note that each vertex 
label X(v) is exactly the output of the inner, whitened cascade WCasc^ k^( v ) 
in WNMAC^ (recall that v is actually a message from {0, l} 6 *). 

For any message tree T(Qc) = (V, E ), a repetition pattern is any equivalence 
relation p on V. For a labeled message tree T f K (Qc) = (V,E, A) we say that a 
repetition pattern p is induced by it if it satisfies 

Mu, v G V : X(u) = X(v) p(u,v). 


3.2 Interactions and Transcripts 

Let QIZc denote the set of qc pairs (x, r) such that x G {0, l} b * is a construction 
query and r G {0, 1} C is a potential response to it (what we mean by “potential” 
will be clear from below). Similarly let QJZf denote the set of q? pairs (x,r) 
such that x G {0, 1} C x {0, l} b is an f-query and r G {0, 1} C is a potential 
response to it. Let Qc C {0, l} 6 * and Qf C {0, 1} C x {0, l} 6 denote the sets 
of first coordinates (i.e. , the queries) in QTZc and QTZf, respectively; we have 
\Qc\ = Qc and \Q f \ = qc 

We call the pair of sets (QIZc, QTZf) valid if the adversary A would indeed 
ask these queries throughout the experiment, assuming that each of her queries 
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would be replied by the respective response in QIZc or QJZf (note that once 
a deterministic A is fixed, this determines whether a given pair (QIZc, QK f) is 
valid) . 

We then define a valid transcript to be of the form 

r = ( QHc , QKuK = (Ki,K 2 , K vr ),T f K (Q c )) , 

where (QIZc, Q7£f) is valid, f: {0, 1} C x {0, l} b — > {0, 1} C is a function and 
K = (Ki, K 2 , K w ) G {0, 1} C x {0, 1} C x {0, l} 6 is a key tuple. 

We differentiate between the ways in which such valid transcripts are gener- 
ated in the real and in the ideal worlds (or experiments) , respectively, by defining 
corresponding distributions T rea i and Tjd ea i over the set of valid transcripts: 

Real World. The transcript T rea i for the adversary A is obtained by sampling 

f 4- T[c + b, c) and K = (K 1 ,K 2 ,K W ) <- {0, 1} C x {0, 1} C x {0, l} 6 , and 
letting T rea | denote 

{QKc = = {(X i ,R i )}t 1 ,K = (K 1 ,K 2 ,K w ),T f K (Q c )), 

where we execute A, which asks construction queries Mi, . . . , M qc answered 
with Yi := WNMAC[f]x(^i) for all i G [qc]] and f-queries Xi^..,X qf 
answered with Ri := f (Xi) for all i G [qt] (note that the C-queries and 
f-queries may in general be interleaved adaptively, depending on A). Finally, 
we let T f K (Qc) be the labeled message tree corresponding to Qc , f and K. 

Ideal World. The transcript Tjdeai f° r th e adversary A is obtained similarly 

to the above, but here, together with the random function f <— T[c + 6, c) 
and the key tuple K — (Ki,K 2 ,K w ) {0, 1} C x {0, 1} C x {0, l} 6 , we also 
sample qc independent random values Yi,...,Tg c G {0, l} r . Then we let 
Tjdeai denote 

0 QKc = {(Mi, *;)}?=! , QU f = {(X i ,R i )}f =1 ,K= (KuK^K^T^iQc)), 

where we execute A, answer each its C-query Mi with Yi for all i G [qc] and 
each its f-query Xi with Ri := f (Xi) for all i G [<#]. Then we let T f K (Qc ) be 
the labeled message tree corresponding to Qc , f and K. 

Later we refer to the above two random experiments as real and ideal, respec- 
tively. Note that the range of T rea i is included in the range of T ideal by definition, 
and that the range of Tjdeai is easily seen to contain all valid transcripts. 

3.3 The H-Coefficient Method 

We upper-bound the advantage A in distinguishing WNMAC[f]jv for f <— 
T{c + 6, c) from a random function in terms of the statistical distance of the 
transcripts, i.e., 

A d<NMAc( A ) < SD(T rea |, Tj dea |) = \ £ | P [Veal = t] - P [T idea , = r]| , (3) 
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where the sum is over all valid transcripts. This is because an adversary for 
T r eai and T ideal i whose optimal advantage is exactly SD(T rea i, Tjdeai)? can always 
output the same decision bit as A, ignoring any extra information provided by 
the transcript. 

We are going to use Patarin’s H-coefficient method [18]. This means that we 
need to partition the set of valid transcripts into good transcripts GT and bad 
transcripts BT and then apply the following lemma. 


Lemma 1 (The H- Coefficient Method [18]). Let 5, e G [0,1] be such that : 


(a) P [T i dea | e BT] < (5. 

(b) For all r G GT, 

P [Treal = t] 

P [Tideal = T] - 


Then, 


^ V WNMAc(^) — SD(T rea |, Tjdeal) ^ 


More verbally, we want a set of good transcripts GT such that with very high 
probability (i.e., 1 — S) a generated transcript in the ideal world is going to be 
in this set, and moreover, for each such good transcript, the probabilities that 
it occurs in the real and in the ideal worlds are roughly the same, i.e., at most a 
multiplicative factor 1 — e apart. 


3.4 Good and Bad Transcripts 

Given a valid transcript r we define the sets jC[ n ,C ou t c {0,1} C x {0,1} 6 as 

An := {(A(ttA)), H(v) © K w ) : v e V \ {e}} 

A>ut := { {Ki, X(v) || 0 b-c ) :v€Q c }, 

and let C = C- in U £ 0 ut • Intuitively, C represents the set of inputs on which f is 
evaluated while processing A’s construction queries in the real experiment. This 
set is also well-defined in the ideal experiment by the above equations, and in 
both experiments it is determined by the transcript. We refer to C- in as the set 
of inner f- invocations , i.e., those invocations of f that were required to evaluate 
the inner, whitened cascade WCasc f in WNMAC; and similarly, £ out denotes the 
outer invocations. 

If there is an intersection between the adversary’s f-queries and the inputs 
in £i n (resp. C Q u t), we call this an inner (resp., outer) C-f- collision. We then 
denote by C-f-col lj n (resp., C-f-col l out ) the event that any inner (resp., outer) 
C-f-collision occurs. Formally, 

C-f-col l in (Qf n £ in ^ 0) and C-f-col l out (Qf fl jC ou t ^ 0) 

and let C-f-coll := C-f-collj n U C-f-col l OU f Furthermore, if the vertex labels A (M) 
collide for two messages M, M' G Qc , we call this a C-collision and denote such 
an event by 

C-coll (3M, M'gQ c : A (M) = A(M')) . 
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Definition 1 ( Good Transcripts). Let 

t = (QK c ,QKf,K = (K 1 ,K 2 ,K w ),T f K (Q c ) = (V,E, A)) 

be a valid transcript. We say that the transcript is good ( and thus r G GT ) if 
the following properties are true: 

(1) The event C-f -col l ou t has not occurred. 

(2) The event C-coll has not occurred. 

(3) For any v G V we have X(v) ^ K 2 . 

We denote as GT the set of all good transcripts, and BT the set of all bad 
transcripts, i.e., transcripts which can possibly occur (i.e., they are in the range 
of T ideal) and are not good. More specifically, we denote by BT^ the set of all 
bad transcripts that do not satisfy the i-th property in the definition of a good 
transcript above, hence we have BT = IJi=i BT*. 


3.5 Probability of a C-f-collision 

In this section we upper-bound the probability of C-f-coll by considering inner 
and outer C-f-collisions separately. 

Lemma 2. We have P ldeal [C-f-collj n ] < ^gc#f/2 6+c . 

Proof. We start by modifying the ideal experiment to obtain an experiment 
denoted ideal 7 and the corresponding transcript distribution T ideal' - The exper- 
iment ideal 7 is given in Fig. 3. Clearly, ideal 7 differs from the ideal experiment 
only in the way the vertex labeling function A(-) is determined. 

We now argue that P ldeal [C-f-collj n ] = P ldeal [C-f-col lj n ] - To see this, consider 
an intermediate experiment ideal 77 that is defined exactly as ideal except that it 
uses a separate ideal compression function g to generate the vertex labels of the 
tree contained in the transcript, where g is completely independent of f queried 
by the adversary (i.e., the adversary queries f and the transcript contains QlZf 
and Tf c (Qc)). It is now clear that P ldeal [C-f-col lj n ] = P ldeal [C-f-col l; n ] since as 
long as no inner C-f-collision happens, the experiments are identical. 

The remaining equality p idea|// [C-f-col lj n ] = p idea|/ [C-f-col lj n ] follows from the 
definition of ideal 7 . It is easy to see that the distribution of vertex labels sam- 
pled in steps 2 and 3 of ideal 7 and by labeling the tree T^{Qc) in ideal 77 are the 
same. In both cases, repeated inputs to the compression function lead to consis- 
tent outputs, while fresh inputs lead to independent random outputs. The two 
experiments only differ in the order of sampling: ideal 77 first samples g and then 
performs the labeling, while ideal 7 starts by sampling the repetition pattern, and 
then chooses the actual labels correspondingly. The same distribution of vertex 
labels in these two experiments then implies the same probability of C-f-col lj n 
occurring. 
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1. The adversary asks its C-queries and f-queries and these are 
answered by independent random values. Once the qc queries in 
Qc are fixed, they also determine the message tree T(Qc) and mappings 
p and 7 r as defined in Section 3.1 (the labeling A is so far undefined). 

2. Sample a repetition pattern p. The equivalence relation p is deter- 
mined indirectly by first iteratively defining a mapping p: V —> [\V\]. 
Recall the vertex ordering vi , , . . , v\v\ defined in Section 3.1. First, set 
p(vi) := 1. Then, for i taking values 2, ...,|R|, determine p(v%) as fol- 
lows. If there exists j G [i — 1] such that p(vj) = p(vi) and p(7r(vj)) = 
p(7r(vi)) then let p(vi) := p(vj) for the minimal such j. Otherwise let 
z := m3JCj e [i-i]{p(vj)} and sample p(vi) as 


1 with probability 2 c 


p(vi) 


z with probability 2 c 
^ z + 1 with probability 1 — z • 2 -c 


Finally, for all i,j G [\V\\ let p(vi,Vj ) (p(vi) = p(vj)). 

3. Sample a vertex labeling A( ) according to p. Namely, sample \p\ 
distinct uniformly random values si,...,S| p | G {0, 1} C where \p\ is the 
number of equivalence classes of p, and let A (vi) := s^ Vi ) for all i G [\V\]. 
Also let K\ X(e). 

4. Sample random keys (i’G, AA) G {0, 1} C x {0, l} 6 . 


Fig. 3. The random experiment idea F for the proofs of Lemmas 2 and 3. 


Finally, we upper-bound the probability P ldeal [C-f-collj n ]. Conditioned on the 
repetition pattern p taking some fixed value rp, in step 2, we have 

p |dea|/ [C-f-colljn I P = rp]< ^2 pldear [(A(7r(f )), pv) © JT W ) € Qf | p = rp\ 

vEV\{e} 

= ^2 pldear pv) 0 K w ) € Qf I p = rp\ 

v£V\{e} 

= J2 </f/2 b+c < tqcqf/2 b+c 

v£V\{e} 

because the random variables Si and K w sampled in steps 3 and 4 are uniformly 
distributed and independent of Qf. Since this bound holds conditioned on p 
being any fixed repetition pattern rp, it remains valid also without conditioning 
on it, hence concluding the proof. □ 


We proceed by upper-bounding the probability of an outer C-f-collision. 
Lemma 3. We have 
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Proof. Let us again consider the experiments ideal 7 and ideal" defined in the 
proof of Lemma 2. We start by the simple observation that for any event A we 
have 


2>idea I 


[A] = P ldeal [A A C-f-collin] + P ldeal [A A -.C-f-collin] 


< lgC9f pideal" 
2 b+c 


PqcQf 


[ A A -C-f-collin] < 26+; 


+ P 


ideal 


[A], 


(4) 


which follows from Lemma 2 and the observation that ideal and ideal" only differ 
if C-f-collin occurs. 

Applying (4) to the event C-f-col l out as A, it remains to bound the probability 
pideal [C-f-col lout]; for this we observe that P ldeal [C-f-col l out ] = P ldeal [C-f-col l out ] 
similarly as before: the repetition pattern p sampled in step 2 of ideal 7 has the 
same distribution as the repetition pattern induced by the tree T^{Qc) in ideal", 
and this together with the sampling performed in step 3 results in the same 
distribution of vertex labels in ideal" and ideal 7 and hence also in the same 
probability of C-f-col l ou t in both experiments. 

Finally, to upper-bound the probability P ldeal [C-f-col l out ] , again conditioned 
on the repetition pattern p sampled in step 2 taking some fixed value rp, we 
have 


P ideal ' [C-f-col l out | p = rp}< E P ideal ' [(K 2 , X(v) || 0 b-c ) e Qf \ p = rp] 

veQc 

< E P' dear [(-^ 2 , Sp(„) || 0 b_c ) e Qf \ p = rp] 

veQc 

= E «f/ 22C ^ 9c9f/2 2c 

v£Qc 

because the random variables Si and K 2 sampled in steps 3 and 4 are uniformly 
distributed and independent of Qf. Since this bound holds conditioned on p 
being any fixed repetition pattern rp, it remains valid also without conditioning 
on it. □ 


3.6 Probability of Repeated Outer Invocations 

In this section we analyze the probability that any of the outer /-invocations 
in the ideal experiment will not be fresh, in particular we upper-bound both 
P[T ideal e BT 2 ] and P[T idea i G BT 3 ], 


Lemma 4. We have 


P ideal [C-coll] < 


tq cQf 
2 b+c 


tqc 2 



2 C 


2 C 
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Proof. Applying (4) to the event C-coll, we have P ldeal [C-coll] < £qcq^/2 h+c + 
P ldeal [C-coll]. Since the queries Qc in the experiment ideal 77 are chosen non- 
adaptively (with respect to the keys Ab, K w and the function g used to later 
compute the tree labeling), we can obtain via a union bound that 

P ideal " [C-coll] < q c 2 • max P®’* 1 ’*" [WCascf^JMi) = \NCasc e KiK jM 2 )] . 

\M 1 \,\M 2 \<£b 


Moreover, we have 


max 

Mi#M 2 

|Mi|,|M 2 |<^ 


WCas 4, = WCas <£.,*„ (M 2 ) 


= £ ^'P‘[wc< 1 ».(m,)=wc.4 i _ i .(m ! ) 

\M 1 l\M 2 \<£b ^1G{0,1} C 
K w e{ 0,l} b 

1 


* E 


— • max P g 

2 C + 6 Mi^m 2 
K 1 e{0,l} c \M lr \M 2 \<£b 

K w e{ 0,l} b 


WCascy (Mi) = \NCasc e KuK jM 2 ) 


= E 2 ^- P g [Casc g Ki (M 1 ®iC w ) = Casc g Ki (M 2 ®iC w )] 

KiG{0,l} c |Mi,|M 2 |<^ 

K w e{o,i} b 

= E m ^ 2 P g [Casc g , i (M 1 ) = Casc| fi (M 2 )], 

Kie{0,l} c |Mi,|M 2 |<^b 

if w e{o,i} 1 ’ ■> v 

CascColl(^) 


where the notation M^©iC w denotes XOR-ing the key K w to each of the 
blocks of Mi. 

The last maximization term above was already studied in the context of the 
construction NI2 in [7], where it was denoted as CColl(^), but we will refer to it 
as CascColl(^) to avoid confusion with the event C-coll considered here. It was 
shown in [7] that 


CascColl(£) < 


i-d!{i) m A 
~2 C 2 2c ' 


( 5 ) 


Putting all the above bounds together concludes the proof of Lemma 4. □ 


Lemma 5. We have 


P ideal [3veV: X(v) = K 2 }<-£-. 

Proof. As is clear from the description of the ideal experiment, the key K 2 is 
chosen uniformly at random and independently of the rest of the experiment, in 
particular of the labels X(v). The lemma hence follows by a simple union bound 
over all £qc vertices v G V. □ 
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3.7 Good Transcripts and Putting Pieces Together 

Let us consider a good transcript r. First, since r 0 BTi, there is no overlap 
between the outer f-invocations and the f-queries issued by the adversary. Sec- 
ond, since r 0 BT2, there is also no repetition between the outer /-invocations 
themselves. Finally, since r 0 BT3, there is also no overlap between the outer 
f-invocations and the inner f-invocations (all the outer invocations contain K 2 
as their first component). Altogether, this means that each outer f-invocation 
in real is fresh and hence its outcome can be seen as freshly uniformly sampled 
(since f is an ideal random function). Therefore, the distribution of these out- 
comes will be the same as in ideal, where they correspond to the independent 
random values Y{. Hence, for all r E GT, we have 

P [T real = r] 

P[Tideal ~r\ 

Plugging this into Lemma 1, together with the bounds from Lemmas 3, 4 
and 5, we obtain 

3 

Adv^ NMAC (A)<^P[T ideal eBT i ] 

< WZc Igcgf £qc?_ _ 

- 2 2c 2 b+c 2 C 

< « +2 . Igcgf , . 

- 2 2c 2 b+c 2 C 

which concludes the proof of Theorem 1. □ 


d'(£) 

d'(£) 


our \ 

) 

64^ 3 


+ 


zqc 


9 c 


3.8 Tightness 

We now argue that the gc#f/2 2c term in our bound on the security of WNMAC 
as given in (2) is tight, by giving a matching attack (up to a linear factor 0(c)). 
For most practical parameters, this will be the dominating term in (2), and thus 
for those parameters Theorem 1 gives a tight bound. Here we only describe an 
attack for the case where qc = 0(c) is very small, and defer the general case to 
the full version. 

The qc = 0(c) Case. We must define an adversary A° ,f who can distinguish 

the case where the first oracle O implements a random function R from the case 

where it implements WNMAC f ((Ah, A2, K w ),-) with random keys 

using the random function f : {0, 1} 6+C — > {0, 1} C which is given as the second 

oracle. 

A°’ f first picks t := ^f/2 c keys Ki, . . . , K t arbitrarily, and then uses its q? 
function queries to learn the outputs 

Zi = {f(Ki,x\\0 b - c ) : x € {0, 1} C } 
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for all the keys. When throwing 2 C balls randomly into 2 C bins, we expect 
a 1 — 1/e « 0.63 fraction of the bins to be non-empty (and the value is 
strongly concentrated around this expectation). We can think of evaluating 
the random function f(^,-||0 6_c ) : {0, 1} C — > {0, 1} C as throwing 2 C balls 
(the inputs) to random bins (the outputs), and thus have \Z{\ 0.63 • 2 C . 

Then A°’ f queries O on 0(c) random inputs, let Q c denote the correspond- 
ing outputs. Now A°’ f outputs 1 if and only if for some i we have Q c C 2b If 
0(0 = WNMAC f ((ii'i, i^2, K w ), •) = f(if 2 , WCasc f ((A'i, i^ w ), -)||0 6_c ) and more- 
over K 2 = Ki for some i - which happens with probability t/ 2 C - then all the 
outputs of O(-) are in the range of f (Ki, .||0 6_c ) and thus A°’ f outputs 1. 

On the other hand, if O(-) is a random function, then every single query will 
miss the set Z{ with constant probability 0.37. Using this, we get by a Chernoff 
bound (and the union bound over all t keys) that 

P[3i : Q c c Zi] < . 

Summing up we get for q c = 0(c) and t = gf/2 c 


^ v \NNMAc(\c,t) ^ 


t 


t 


2 c 2 0 Oc) 


> 


> 


Qf 


me 


2 C 1 _ 2 2c_1 2 2c • 0(c) 


which matches our term q^qc/2 2c from the lower bound up to a 0(c) factor. 


3.9 Distinguishing-H Security of WNMAC 

The above results also imply a bound on the distinguishing-H security of 
WNMAC. To capture this, we first introduce the notion of distinguishing-C, 
which corresponds to PRF-security with the restriction that the distinguisher 
only uses construction queries. 

Definition 2 (Distinguishing-C). Let C[f] : {0,1}^ x V — ► 1Z be a keyed 

$ 

construction making queries to a randomly chosen compression function f <— 
T(c + 6, c). The distinguishing-C advantage of an adversary A is defined as 


Advqfj~ C (A) := 


K 4- {0, 1} K , f 4- T[c + b, c) : A c * 


R 4- T(T), 7 Z) : A R 


The notion of distinguishing-C is useful for bridging distinguishing-H and PRF- 
security, as the following lemma shows (we omit its simple proof). 

Lemma 6. For every adversary A asking qc and q f construction and primitive 
queries, respectively, there exists an adversary A' asking qc queries to its single 
oracle such that 


Advf‘- H (A) < Advq f] (A) + Adv^A') 
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and 


Advq f] (A) < Adv d c ist - H (A) + Adv^A'). 

One can readily obtain a bound on the distinguishing-C security of WNMAC 
using Theorem 1 with q? = 0. 

Lemma 7 (Distinguishing-C Security of WNMAC). Let A be an adversary 
making at most qc construction queries, each of length at most £ b-bit blocks. 
Let K = (Ki, K 2 , K w ) G {0,1} C x {0, 1} C x {0, l} 6 be a tuple of random keys. 
Then we have 


AdvwNM ACK (A) < 


tqc 2 

2 C 



64£ 3 



By combining Theorem 1 and Lemmas 6 and 7, we get the following theorem. 


Theorem 2 (Distinguishing-H Security of WNMAC). Let A be an adver- 
sary making at most q f queries to the compression function and at most qc con- 
struction queries, each of length at most £ b-bit blocks. Let K = (K\, K 2 , K w ) G 
{0, 1} C x {0, 1} C x {0, l} 5 be a tuple of random keys. Then we have 


Adv 


dist-H 

WNMAC 


K 


(A)< 


Me o Igcgf o Igc 2 
2 2c 2 b+c 2 C 



64C 

— 



3.10 State Recovery for WNMAC 

We now formally define the notion of security against state recovery for WNMAC. 
We consider the strong notion where the goal of the adversary is to output a 
pair (M, s) such that the state s occurs at any point during the evaluation of 
WCasc on M. Formally, we define Adv^/NMAC[f] (A) to be 


P 


K 4 1C, f 4 T, A wnmac - >f 


^(M,s) : 


3M' e {0,l} b * s.t. M’ I M A WCasc^JM') = s 


where K, = {0, 1} C x {0, 1} C x {0,1} 6 , K = (Ki,K 2 , K w ) and T := T{c + b, c). 


Theorem 3 (State-Recovery Security of WNMAC). Let A be an adversary 
making at most q f queries to the compression function and at most qc construc- 
tion queries, each of length at most £ b-bit blocks. Let K = (Ki, K 2 , K w ) G 
{0, 1} C x {0, 1} C x {0, l} b be a tuple of random keys. Then we have 


Adv WNMAC f^(A) < 


me £qcQf 

22c 2^+ c 


+ 2 - 


tqc 2 



64£ 3 


2 C 


2 C 


Generic Security of NMAC and HMAC with Input Whitening 105 


Proof (sketch). First, we replace the compression function oracle f by an indepen- 
dent random function g completely unrelated to WNMAC f . The error introduced 
by this is upper-bounded by Theorem 2 and now, compression-function queries 
are useless to the adversary, hence we can disregard them. 

Let us denote by £ the experiment where A interacts with WNMAC f (without 
direct access to f). Consider an alternative experiment £' given in Fig. 4. As 
long as the key K 2 chosen in step 4 does not hit any of the internal states 
that occurred during the query evaluation, the experiment £' is identical to £. 
Moreover, since K 2 is chosen independently at random, such a hit can only occur 
with probability at most ^gc/2 c . Since the vertex labels are only sampled after 
the adversary makes its guess for the state, the probability that the guess will 
be correct is at most ^|2 C . □ 


1. The adversary asks its C-queries. For each of them, only the repe- 
tition pattern for the state values belonging to this query is sampled (as 
in the experiment ideal 7 in Figure 3) and the query is answered with a 
fresh random value, unless the outer f-invocation happens on a repeated 
value, in which case the query is answered consistently. After answering 
all queries, we have a complete repetition pattern p for all state values. 

2. Let A output its guess (M, s). 

3. Sample a vertex labeling A(-) according to p, let K\ A(e). 

4. Sample random keys (K 2 ,K W ) E {0, 1} C x {0, l } b . 


Fig. 4. The random experiment £' for the proof of Theorem 3. 


4 Whitening HMAC 

HMAC is a “practice-oriented” variant of NMAC, see Sect. 2 for its definition. 
In this section we consider a “whitened” variant WHMAC of HMAC which is 
derived from HMAC in the same way as WNMAC was derived from NMAC, 
i.e., by XORing a random key K w to every message block. We also consider a 
variant WHMAC + where the first message block is a fresh key iC + E {0, l} 5 . 
More precisely, 


WHMAC x ,K w [f](m) := f (k' 2 , WCas4 ;Kw (m)||fpad) 


where 

K[ := f (IV, K © ipad) and K' 2 := f (IV, K 0 opad) 
and fpad is some fixed padding; and 

WHMAC+ ^ + [f](rn) : = f (^,WCas4 ;)Xw (m)||fpad) , 


(6) 
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where this time 

Z := f (I V, K © ipad) and K[:=f(Z,K + ) and K' 2 := f (IV, K © opad) 

and fpad is again some padding. Note that both variants, WHMAC and 
WHMAC+, can be implemented given just black-box access to an implemen- 
tation of HMAC. 

The theorem below relates the security of WHMAC and WHMAC + to the 
security of WNMAC. 

Theorem 4 (Relating Security of WHMAC to WNMAC). Consider any 
xxx G {prf dist-H, sr}. Assume that for every adversary A making at most q? 
queries to the compression function f and at most qc construction queries , each 
of length at most i b-bit blocks , we have 

hdvvmw\AC Kl ,k 2 , kt w [f] (A) < G 

where here and below, K\,K2 G {0, 1} C and K,K W1 K+ G {0, l} b are uniformly 
random keys. Then for every such adversary A we have 

Adv^ HMACffKw[f] (A)< e + 2-^ (7) 

and 

Ad v WH MA c+ K Kw K+ [f](A) < e + 2 • 2 2 + 2 c . (8) 

Proof. Intuitively, for WHMAC one can think of f as an extractor which extracts 
keys K[,K 2 from K , and the bound then readily follows by the leftover hash 
lemma. For WNMAC + one can roughly think of K[ and K r 2 as being extracted 
from independent keys iC + and K , respectively. For the latter it is thus sufficient 
that b (which is the length, and thus also the entropy of the uniform K and K + ) 
is sufficiently larger than c (the length of K[, K 2 ), whereas for the former we 
need b to be sufficiently larger than 2c. We now give the details of the proof for 
WHMAC and postpone the treatment of WNMAC + to the full version. 

In order to prove the bound (7) it is sufficient to show that the statistical 
distance between the transcripts (as seen by the adversary) when interacting 
with WNMAC or WHMAC is at most 2“ ^ £ . As the only difference between 
WNMAC and WHMAC is that we replace the uniform keys Ki,K 2 with keys 
K[,K 2 derived according to (6), to bound the distance between the transcripts, 
it is sufficient to bound the distance between the random and derived keys. As 
K[,K 2 are not independent of f, it is important to bound the distance when 
given f, concretely, we must show that 


SD((iC;,iC',f),(iC 1 ,iC 2 ,f))<2- 


We will use the leftover hash lemma [12] which states that for any random vari- 
able X G {0, l} m with min-entropy at least H oc (X) > k and a hash function 
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h : {0, l} m — > {0, 1} £ chosen from a family of pairwise independent hash func- 
tions we have (with Ut being uniform over {0,1}^) 

SD ((/i(X), h) , (U e , h )) < 2 *~"” W < 2^ . 

Since f : {0, 1} 6+C — > {0, 1} C is uniformly random, also the function 

f \K) = (f (IV, K 0 i pad) , f (I V, K ® opad)) 

is uniformly random, and thus also pairwise independent. Using H^K) = 
H^K © ipad) = b and {K^K^) = f'(lf) we thus get 

SD((K' 1: K! 2: f'),(K 1 ,K2,f')) = SD((K[,K^f),(K u K 2 ,f)) < 2 “^ 

as required. The first equality above holds as f defines all of f' and vice versa. □ 


5 The Dual WNMAC Construction 


Looking at the security bounds for WNMAC given in Sect. 3 from a distance, it 
seems that under reasonable assumptions the most restrictive term in the bounds 
is gfgc/2 2c . Intuitively speaking, the reason for this term is the outer f-call in 
WNMAC that only takes 2c bits of actual inputs and adds b — c padding zeroes. 

In an attempt to overcome this limitation, we propose a variant of the 
WNMAC construction that we call Dual WNMAC (DWNMAC). We prove the 
PRF-security of DWNMAC that goes beyond the restrictive term q^qc/2 2c and 
our proof again extends also to distinguishing-H and state-recovery security. 
The price we pay for this improvement is a slight increase in the key length and 
the fact that DWNMAC cannot be implemented using only black-box access to 
NMAC. Similarly, if we apply the same modification to WHMAC, the resulting 
construction can no longer be implemented using black-box access to HMAC. 

The construction DWNMAC is derived from WNMAC, the only difference 
being that the outer f-call is performed on the c-bit state and a 6-bit key . 
More precisely, for a key tuple (iCi, K 2 , K w ) E {0, 1} C x {0, l} b x {0, l} 6 and a 
message M E {0, l} 6 * , we define 

DWNMAC f ((i^i, X 2 , -Kw), M) := f(\NCasc f KuK jM),K 2 ). 

Note that DWNMAC is slightly similar to what we would obtain by whitening 
from the Sandwich MAC construction [23]. 

We now summarize the security of DWNMAC. 


Theorem 5. (Security of DWNMAC). Let A be an adversary making at most 
qt queries to the compression function f and at most qc construction queries, each 
of length at most £ b-bit blocks. Let K = (iCi, K w ) E {0, 1} C x {0, l} b x {0, l} b 
be a tuple of random keys. Then we have 


A^ V DWNMAC f (A) < 3 


Igcgf 

2 6+c 


^c 2 




for all xxx E {prf dist-H , sr}. 
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Proof (sketch). The proofs are analogous to the proofs for WNMAC given in 
Sect. 3, with the main modification needed in Lemma 3 where the probability 
of an outer C-f-collision can be upper-bounded by gc^f/2 6+c . Roughly speak- 
ing, this is because the outer call in DWNMAC does not contain the 0 6-c 
padding and instead processes b + c bits of input that are hard to predict for the 
attacker. □ 
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Abstract. It is well known that three and four rounds of balanced Feis- 
tel cipher or Luby-Rackoff (LR) encryption for two blocks messages are 
pseudorandom permutation (PRP) and strong pseudorandom permuta- 
tion (SPRP) respectively. A block is n-bit long for some positive integer 
n and a (possibly keyed) block-function is a nonlinear function map- 
ping all blocks to themselves, e.g. blockcipher. XLS (extended Latin 
Square) encryption defined over two block inputs with three blockcipher 
calls was claimed to be SPRP. However, later Nandi showed that it is not 
a SPRP. Motivating with these observations, we consider the following 
questions in this paper: What is the minimum number of invocations of 
block-functions required to achieve PRP or SPRP security over i blocks 
inputs ? To answer this question, we consider all those length-preserving 
encryption schemes, called linear encryption mode, for which only 
nonlinear operations are block-functions. Here, we prove the following 
results for these encryption schemes: 

1. At least 2£ (or 2£ — 1) invocations of block-functions are required to 
achieve SPRP (or PRP respectively). These bounds are also tight. 

2. To achieve the above bound for PRP over £ > 1 blocks, either we need 
at least two keys or it can not be inverse- free (i.e., need to apply the 
inverses of block- functions in the decryption). In particular, we show 
that a single-keyed inverse-free PRP needs 2£ invocations of block 
functions. 

3. We show that 3-round LR using a single- keyed pseudorandom func- 
tion (PRF) is PRP if we xor a block of input by a masking key. 


Keywords: XLS • CMC • Luby-Rackoff • PRP • SPRP • Blockcipher 


1 Introduction 

Block function. For all symmetric key algorithms, domains (sometimes, also 
ranges) are desired to be sets of bit-strings of variable sizes. However, almost 
all known methodologies, known as modes, use one or more (usually keyed) 
functions defined over small and fixed lengths (e.g., blockcipher, compression 
function, permutations in sponge constructions etc.) in a black-box manner. 
We call a function from I n := {0, l} n (elements of the set are called blocks) 
to itself a block function. Throughout the paper we fix a positive integer n. 
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A keyed blockcipher is a popular example of block function. Multiplying (as a 
field multiplication over I n ) an element by a secret key K can also be considered 
to be a block function as it maps a block input x to K ■ x E I n . Outputs 
of a streamcipher with one block seed, can also be viewed as a sequence of 
execution of different block functions. In fact, any function mapping one block 
to multiple blocks can be viewed as a sequence of executions of block functions. 
Whereas, a function mapping multiple blocks to a single block can not be in 
general expressed through block functions. For example, compression function, 
or mapping (x,y) to (x + K) • (y + K) (known as pseudo dot-product) are not 
examples of block functions as they take more than one block as an input. 

Length-Preserving Encryption. An encryption algorithm is called length- 
preserving if the the number of blocks in a plaintext and its corresponding 
ciphertext are same. A length-preserving encryption is called an enciphering 
scheme. In addition with the theoretical interest, an enciphering scheme has 
some practical applications. Among others, a popular application is disk-sector 
encryption addressed by the “IEEE Security in Storage” Work Group PI 619. 
An enciphering scheme is said to be (S)PRP or (strong) pseudorandom permu- 
tation [34, 35] if it is secure against adversaries making only plaintext queries (or 
both plaintext, ciphertext queries respectively). The building block keyed block 
function is assumed to be PRP or PRF (pseudorandom function [12]). 

Linear Mode. In this paper we consider a wide class of enciphering schemes and 
pseudorandom functions based on linear mode. Informally, a linear mode (LM) 
is defined by an oracle algorithm which interacts with block functions (usually 
keyed) as oracles such that all inputs of the block functions are computed through 
some public linear functions (determined in the design) of the previous obtained 
responses. Finally, the output is also computed through a public linear function 
of all responses of block functions and the input. 

This class is indeed a wide class of encryption algorithms. Most of the 
known symmetric key encryptions, e.g., Luby-Rackoff (LR) [23,28], Feistel type 
Encryption Schemes [6,17] CMC [15], EME [13,16] HCTR [9,51], TET [14], 
HEH [47] etc. are some examples of enciphering schemes based on linear mode. 
Almost all pseudorandom functions (e.g., CBC-MAC [5], PM AC [8], TMAC [22], 
OMAC [18], DAG-based constructions [20], a sub-class of affine domain exten- 
sion or ADE [29] etc.) are also based on linear mode. Thus, the linear mode 
based keyed construction includes a wide class of symmetric key algorithms. 


1.1 Brief Literature Survey 

Now we briefly revisit the related results. Feistel structure is used to define 
different blockciphers e.g., Lucifer [50], DES etc. Later, Luby-Rackoff provides 
the PRP and SPRP security analysis of this type of ciphers and since then it is 
also popularly known as Luby-Rackoff (LR) cipher. In particular it was shown 
that three and four round LR cipher are PRP and SPRP secure respectively. 
Each round invokes exactly one block function. There are many results known 
for security analysis of different rounds of LR and for different forms of Feistel 
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structures [6,28,39,40,48]. Many results are known for reducing the key-sizes (i.e. 
reusing the round keys [37,38,42,46]). Nandi [28] has characterized that all secure 
LR encryption schemes must have non-palindrome key- scheduling algorithms. 
Thus, we cannot use one single key. 

XLS [43] is proposed to construct a generic encryption scheme which takes 
incomplete message blocks given that an encryption which can take complete 
message blocks. A particular instantiation of XLS invokes three block functions 
and claimed to be SPRP secure. However, the result is shown to be wrong [31] 
and some of implications (e.g., COPA [2] which uses XLS) are shown very 
recently [32]. Among all linear mode based length-preserving SPRP, the CMC 
and four-round Luby-Rackoff require only 2£ calls for encrypting £ blocks and 
others requires more (e.g., EME requires 2^+1 calls etc.). Understanding optimal- 
ity of SPRP and PRP, in terms of the number of blockcipher or block function 
calls, is our main motivation of this paper. 

A class of authenticated encryption modes linear over the field was proposed 
by Jutla [21]. This class is more restricted than our linear mode as the linearity 
is considered over I n instead of binary. In other words, only linear operation 
is bit-wise xor (without having any rotation or permutation of bit positions, 
multiplying by primitive element etc.). Jutla had shown that the number of 
invocations of blockcipher calls plus the number of masking keys should be about 
£ + 0( log 2 £). 

1.2 Our Contribution 

(1) Optimality in PRP and SPRP. Lear Bahack in his submission of the 
design called Julius [1] stated that 2£—l blockcipher encryptions are required for 
achieving “simple linear mode” PRP over i blocks. However, their result is still 
unpublished and so formalizing the issue and proving such a statement is yet 
to know. Moreover, no such claim is known for SPRP security. In this paper we 
provide a formal definition of linear mode in Sect. 3. In Sect. 4, we formally show 
that a linear mode based length-preserving PRP (or SPRP) over £ blocks must 
invoke block functions at least 2£ — 1 (or respectively, 2£) times. This justifies 
why XLS or three rounds of Luby-Rackoff are not SPRP. This bound is tight 
as three and four-rounds LR, CMC (for arbitrary block messages) etc. achieve 
these bounds. 

(2) Optimality in Single- key Inverse- Free PRP. Inverse-free encryp- 
tions [6,17,19,23] like LR cipher are useful in terms of implementation as we 
do not need to implement the inverse of the building-block for the combined 
implementation of encryption and decryption. In Sect. 5, we show that any linear- 
mode based inverse-free single key length-preserving PRP over £ blocks requires 
at least 2£ invocations (which is actually same for SPRP constructions). This 
shows that PRP and SPRP becomes equally costly for single-keyed inverse-free 
encryptions. Although all distinguishers of our paper are differential distinguish- 
es, the PRP distinguisher for an inverse- free single key construction is different 
from the above SPRP attacks. 
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(3) Three-Round Single-PRF Based LR with a Masking is PRP. The 

above observation says that to achieve inverse-free double-block PRP with three 
invocations, we can use two independent PRF (e.g., the constructions in [28] 
are such examples). Two independent keyed PRF may be more costly than one 
keyed PRF due to key- scheduling or key set-up algorithms [10,44]. In the later 
part of the Sect. 5, we show that the single PRF based three round LR is indeed 
PRP if we simply mask one block of the input by a masking key. 

Significance. Our above two distinguishing attacks provide a limitation on the 
performance of a (inverse-free) length-preserving encryption or pseudorandom 
function or permutation. This applies to a wide class of encryption algorithms 
including online encryption, authenticated encryption (without any nonce) etc. 
and so it has impact on designs and analysis in symmetric key cryptography. 

Novelty of The Attack Idea. In [30] the minimum number of multiplications 
required to achieve A universal hash has been proposed. Like all other differential 
attacks (where zero differences are exploited), our PRP distinguisher and the 
AXJ attack from [30] basically finds zero differences in the input of non-linear 
functions for some executions. Basic intuition of our SPRP distinguishing attack 
is also similar to that of the distinguishing attack of XLS. However, to make 
all these applicable for general constructions, we need to find an appropriate 
difference in queries. For this, we adopt methodologies from linear algebra. The 
PRP distinguisher for single keyed inverse-free construction also exploits zero 
differential propagation. However, to achieve zero differential in one more block 
than expected (for a PRP distinguisher) is the tricky part of the attack. This 
essentially allows to achieve a PRP distinguisher even if we invoke one extra 
block function compared to usual PRP construction. 


2 Preliminaries 

A block matrix is a binary square matrix of size n. Let M n (a, b ) denote the set 
of all partitioned matrices E aX b (of size a x b as a block partitioned matrix and 
of size an x bn as a binary matrix) whose (i, j) th entry, denoted E[i, j], is a block- 
matrix for all i G [l..a] = {1, . . . , a} and j G [1..6]. The transpose of E , denoted 
E tr , is applied as a binary matrix. Thus, E tr [i,j] = E[j,i] tr . Conventionally, 
any matrix E aX b is written as the following block- wise partition matrices 


E = 


(E\ 1,1] £[1,2] ■■■E[l,b}\ 

£[ 2 , 1 ] £[ 2 , 2 ] ••• £[ 2 , 6 ] 


\£[a, 1] £[a, 2] • • • £[a, b]J 


(E[ 1,*]\ 
E[ 2,*] 


:= (£[*, 1 ] £[*, 2 ] •••£[*,&]) 


\E[a, *]/ 


where E[i,*\ and E[*,j\ denote i th block-row and j th block-column respectively. 
For 1 < i < j < a, we also write E[i..j ; *] to mean the sub- matrix consisting 


On the Optimality of Non-Linear Computations 117 


of all rows in between i and j. We simply write E[..j ; *] or E[i.. ; *] to denote 
E[l..j ; *] and E[i..a ; *] respectively. Similar notation for columns are defined. 

Definition 1. A (square) matrix E £ M n (a,a) is called (block-wise) strictly 
lower triangular if for all 1 < i < j < a, E[i,j] = 0 (zero matrix). 

For all x = (aq, . . . , x a ) £ /", we define a linear function mapping a blocks to b 
blocks as Ex = (yi, . . . , yf). Here, we consider x and y as binary column vectors 
(we follow this convention which should be understood from the context). So the 
block matrix E[i,j\ represents the contribution of Xj to define yi. More formally, 

yi = E[i, 1] • x\ + E[i, 2] • x 2 + h E[i, a] • x a , 1 < i <b. 

If E is a strictly lower triangular matrix then yi is clearly functionally indepen- 
dent of Xi,...,x a , 1 < i < a. So if we associate yi uniquely to each Xi (e.g., 
yi = p{xi) for some function p) then the choice of the vectors x and y satisfying 
E-x = y becomes unique. This observation is useful while we define intermediate 
inputs and outputs of a black-box based construction. 

2.1 Useful Properties of Matrices 

It is well known that the maximum number of linearly independent (binary) 
rows and columns of a matrix A £ M n (s,£) are same and this number is called 
rank of the matrix, denoted rank(H). So clearly we have rank(H) < min{ns, nt}. 
By using Gaussian elimination method, denoted x = sol ve(H, 6), we can solve 
for some x (not necessarily unique) of the system of solvable linear equations 
A - x = b. By convention, whenever a non-zero solution exists it returns a non- 
zero solution. Note that if w tr = solve(H tr , b tr ) then w - A = b (by applying 
transpose). The following results are straightforward and so we skip the proofs. 

Lemma 1. Let A £ M n (s,£) and r := rank(H). 

(1) If r < ns (i.e. presence of row- dependency) then solve(A tr , 0) returns a 
non-zero solution. 

(2) Similarly for r < nt (i.e. presence of column- dependency) solve(A,0) 
returns a non-zero solution. 

(3) Finally , let r = nt (i.e., full column rank) and b := A - w. Then, 
solve(A,b ) = w (i.e., w is also the unique solution). 

Lemma 2. Suppose A £ M n (s,s) is a non-singular matrix, i.e., rank(A) = ns. 
Let t < s and 


( A[..t, *] 0 \ 

B=\ 0 A[..t,*\ 

A[t + !..,*]/ 


where 0 denotes the zero matrix of appropriate size. Then, rank(H) = n(s + t) 
(i.e., full row-rank). 
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2.2 Security Definitions and Notation 

In this section we quickly recall the security definitions of fixed length keyed con- 
structions. One can also extend the definitions for variable length constructions. 

PRF. We call an oracle algorithm A (£, g)-algorithm if it makes at most q queries 
and runs in time t. Let JC be a key-space and / : JC x — > I h n be a (keyed) 
function. We say that / is (g, £, e)-PRF if for any (£, g)-algorithm A the prf- 
distinguishing advantage 

AdvP' f (i) := |Pr [A /k = 1;K 4 - 1C] - Pr [A 9 = l\g 4 - Func(a,6)]| 

is at most e where Func(a, b) denotes the set of all functions from to /£. We 
call randomly chosen g to be the (uniform) random function. 

Notation. For notational simplicity, we skip the time parameter t which is 
irrelevant in this paper. We also simply write Func := Func(l, 1) and Perm to 
mean the set of all functions and permutations over I n . 

(S)PRP. A keyed permutation g over is a function g : JC x such that 

for all key k G /C, gk := g{K, •) G Perm(a) (the set of all permutations over /“). 
We denote the uniformly chosen permutation by II a and call uniform random 
permutation. A keyed permutation g is called (g, e)-PRP if for any g-algorithm 

A the prp-distinguishing advantage 

Adv prp (A) := |Pr [A 9 ^ = 1; K 2- K] — Pr {A n “ = 1]| 

is at most e. By PRF-PRP switching lemma [4,49], it is well known that 
|Adv prf (A) — Adv prp (A)| < (^)2~ n . We define the sprp-distinguishing 
advantage 

Adv s / rp (A) := |Pr [A fKj ^ = 1; K X 1C] - Pr [A 11 " 11 * 1 = 1] | 
and (g, e)-SPRP. 

2.3 Tools for Proving Security 

Given a g-algorithm A interacting with an oracle O we denote the 
transcript r(A°) by the random vector ((Xl, Yi), . . . , (X q ,Y q )) where Xi G 
and Yi G l\ are the i th query made by and response obtained by A respectively. 
The following theorem, known as coefficient-H technique [36,41] is very useful 
to show a construction is PRP or SPRP. It has also been adapted in [7, 25] 

Theorem 1 (Coefficient-H Technique). Let f : JC x — > 1^ be a keyed 

function and Vbad C (/“ x I^) q . Suppose 

1. for all q-algorithm B, Pr[r(B ra ’ h ) G Vbad] < e i an d 

2. for all r = ((x 1 ,y 1 ),...,{x q ,y q )) 0 V ba d, 

Pr[f K (x i) = 2/i, = y,; A £ AC] > (1 — e 2 ) x 2“^. 

Then , for all q-algorithm A, Adv prf (A) < e\ + 62 . 
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3 Linear Mode 

3.1 Linear Query and Mode 

A block matrix U E M n (£, a + £) is called (a,£)- query function if U[* ; a + 1..] 
is block- wise strictly lower triangular. Here £ represents the number of queries 
and a represents the number of blocks in the input. For any such query func- 
tion, , an input X E (and a tuple of £ functions p = (pi,...,/^) over 
J n ), we can uniquely define or associate u and v, called intermediate input 
and output vector respectively, satisfying (1) U • (^) = u and (2) p(u) := 
(pi(^i) 5 • • • 5 Pi(ui)) = v. This can be easily shown by recursive definitions of 
Ui s and vfis. More precisely, iq is uniquely determined by . . . ,^-i and X 
(through the linear function) and Vi is uniquely determined by U{ through pi, for 
all 1 < i < £. Informally, a (a, 6, £)-linear mode is a mode which takes a blocks 
input and returns b blocks output based on executing block functions building 
blocks (see Fig. 1 for an illustration of a linear mode). Formally, (a, 6, £)-linear 
mode is defined by a block matrix E E M n (^ + 6, i + a) where E[l..£ ; *] is a 
(a,£)- query function. For any Gtuple of functions p E Func^, the corresponding 
linear- mode function E? : — > 1^ is defined as E?(X) = Y where 

E '© = (?)’ ~ p{u) = v - 



Fig. 1. Linear Mode: Here U[i,*\ means the i th block row which maps 
(X, vi , . . . , Vi-i, 0 £_z+1 ) to m. Finally, U[£+ 1.. ; *] maps the input X and intermediate 
output vector v to the output Y consisiting of b blocks. 

So v is the intermediate output vector associated to the input X and the final 
output Y := E[£+ 1.. ; *] • (^) , a linear function of v and X. Now we state a useful 
differential property of linear mode. Note that the functions of p are non-linear 
and would be secret for the adversaries. So to obtain any information about 
the intermediate input and output, we only can equate intermediate outputs 
whenever two inputs collide for same function. Given any vectors x, x' of same 
size, we write Ax to mean x ® x' and A a ^x to mean {x a ® x ' a , . . . , x^ ® x' b ). We 
simply write A t x to mean A\ mmt x (the first t elements of Ax) (Fig. 2). 

Lemma 3. Suppose E[..t ; *]-X = E[..t] *] • X' (i.e., E[..t ; *] • Z\X = 0). Let 
E?(X) = Y , EP(X') = Y' . Let v,v' and u,u! denote intermediate outputs and 
inputs respectively associated with X and X' (for the function tuple p) respec- 
tively. Then , A t u = A t v = 0^ and 


AY = E\i + 1 .. ; ..a] ■ AX + E[£ + 1 .. ; a + t + 1..] • Av t +i. 
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AX AX A X 



Fig. 2. Differential patterm of the linear mode: we choose AX such that the first t input 
differences of the p functions are zero. So the final difference AY can be expressed as 
the linear function of the rest of the differences Avt+ 1.. and AX. 


Proof. Due to choice of X and X', by induction one can show that = 

(Vi, ^i), • • • (ut, vt) = {u' t ,v' t ) where u and v! denote the intermediate inputs 
associated with X and X' respectively (for the function tuple p). In other words, 
A t u = A t v = 0*. Now, Y = E[£-\- 1.. ; a + 1..] -v-\-E[£ + 1.. ; ..a] X and similarly 
Y' = E[£ + 1.. * a + 1..] • v f + E\i + 1.. ; ..a] • X' . The result is followed after we 
add these two equations and using that A t v = 0 t . □ 

3.2 Keyed Constructions Based on Linear Mode 

Keyed Linear Mode. Let T = T\ x • • • x Tf and He a non-negative integer 
where Ti C Func. A key-space 1C for any keyed function is of the form 1% x T . 
We call T the function-key space and I* masking-key space. Any function g is 
also written as g +1 . 

Definition 2. Let p : [l..£] — > [1 called key- assignment function, a := 
(aq, . . . , a#) G {+1, — 1}^, called inverse- assignment tuple. For any function-key 
p — (pi, . . . ,pf) G define p™ := . . . , p^ ) . We denote the set of all 

functions p ^ by Tff . 

Here we implicitly assume that whenever = —1, p M . is a permutation. If 
o = 1^, we simply skip the notation a. In general, the presence of inverse call of 
building blocks may be required when we consider decryption of keyed function. 
For the encryption, or a keyed function where decryption is not defined, w.l.o.g. 
we may assume that a = Y. 

Definition 3. A (k, a , b ) keyed linear mode with key-space K, key- assignment 
function p, is a (a+fc, 6, £) linear mode E. For each key n := (L, p) G 1C := 1% xtF, 
we define a keyed function E^{P) := E P ^(L,P). 

Keyed linear mode E is actually a linear mode with a part of the input is the 
masking key and function tuples are also derived by reusing some keyed block 
functions. 

Example 1. Consider the simple variant of PMAC [8,45] defined over (see 
Fig. 3 above). Let (pi, . . . ,p a ) be the input. 

a— 1 

1 < i < a - 1, Ui = pi and u a = p a © Vj). 

i= 1 
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U\ — Pl 'U'a—l Pa— 1 



Fig. 3. The simplified structure of PM AC. The input is (pi, . . . ,p a ) and the output 
is ci. 


Finally the output is defined as ci = v a . Here i = a and 6 = 1 . There is no 
masking key, i.e. k = 0 and / = a (all function- keys are independently chosen). 
The key-assignment function p is an identity function. 

In a single function- key version of PMAC (with independent masking key), 
we have / = 1 = k. The iq — a 1 -L@pi for 1 < i < a and u a = p a ® (©^=1 Vj)($L. 
Here the key- assignment function maps all indices to the key-index 1 (as there 
is only one choice of key) . 

Affine Domain Extension or ADE [ 29 ]. As defined in [ 29 ], affine domain 
extension over is nothing but a (a, 1 , ^-linear mode keyed function E such 
that the key-space is JC = T C Func, i.e., / = 1 (single function- key) and k = 0 
(no masking key). Moreover, the final output is the response of the last oracle 
call, i.e. V£. Like PMAC, the key- assignment function for ADE maps all indices to 
the key-index 1. One can consider an injective padding rule and sequence of such 
constructions indexed by a to incorporate variable length inputs. CBC-MAC [ 5 ], 
PMAC [ 8 , 24 , 33 ], TMAC [22], OMAC [ 18 , 27 ], DAG-based constructions [20] etc. 
are some examples of ADE. 

Length Preserving Linear Encryption Mode. A keyed linear mode E is 
called length-preserving (LP) encryption if E K is encryption scheme and a = b. 
In addition to these, we also assume that its decryption algorithm D is also a 
keyed linear mode which is indeed true for all known linear encryption modes. 
We first see an example below. 

Example 2 . As an example, consider Luby-Rackoff (LR) keyed function with 
three rounds using two random functions pi,/?2, be. / = 2, a = b = 2 and 
i — 3 (three invocations of the underlying block functions). Consider the key- 
assignment function i r with tt\ = 1,7T2 = 1 and it 3 = 2. So the function tuple 
after applying the key-assignment is (pi,pi,p2)- As there is no masking key, we 
have k = 0. So the key-space is Func 2 . Given (^1,^2) C / 2 we define 

ui := pi, V\ = p 1 (u 1 ),u 2 = v 1 +p 2 ,v 2 = pi{u 2 ),u 3 = v 2 +pi,v 3 = ^2(^3)- 

Finally, the output is (ci,C2) where c± := u 3 and C2 = V3 + u 2 . This is clearly 
decryptable. Consider ui s, Vi s and pi s as variables. The ciphertext provides 
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two linear functions of these variables, namely us and v% + U2- So us is in the 
span. As u% is in the span, vs is also computable. Thus u 2 is in the span of the 
extended ciphertext including ^3. Again v 2 is computable and hence u\ := pi is 
in the extended span. Finally, P2 is in the span after including v\. So we see that 
that decryption algorithm is also linear mode (Fig. 4 ). 


P2 Pi 



Ci = U3 C2 = V3 + U2 


Fig. 4. LR with three round. 


Decryption Algorithm of a Keyed Linear Encryption Mode. From the 
above example, it is clear that the intermediate input outputs for the building 
blocks would be same if we encrypt and then decrypt as we do in the correctness 
condition: D K (E K (P)) = P. Informally, if some input-output does not arise in 
the decryption then either this input-output is redundant in the encryption 
computation or the correctness condition does not hold (due to randomness of the 
output which has influence in the encryption but is not used in the decryption). 
We now describe the details of a length preserving linear encryption mode for 
which all invocations of block function calls are not redundant. 

Definition 4 (Reordering of Vectors). Let a := (aa, . . . , af) G {1,-1}^, 
and /3 = (/?i, . . . , (If) be a permutation over [1.1]. A pair of vectors (w,z) G iff 
is (a, /?) -reordering of a pair of vectors (u, v) G iff if 




{up.,vp.) ifOLi = 1, 

C VfaUpf) if Oi{ — 1 . 


Definition 5 .A (k + a, a, L) -linear mode E is called linear- mode length- 
preserving encryption with key- space 1 C := if x T and key- assignment 7 r if the 
corresponding decryption algorithm D is also a (k + a, a,t) -linear mode with 
(1) an inverse assignment-tuple a := (aa, . . . , a^) G {1,-1}^ and (2) key- 
assignment 7 r' := tv o /3 where /3 = (/?i,...,/?^) is a permutation over [ 1 . 1 ]. 
Moreover, VP G/“,lG if, p = (pi, . . . ,p/) G T , 


E- 




v 


,Ptti(^i) = V U . ..p^Xui) = V£ if and only if 
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D • 



, p°} Ol) = Z ! , . . . , p“f ( We ) = Z t 


where (w,z) is (a, /?) -reordering of (u,v). 

The above definition implies that correctness condition of an encryption 
D p k' (L,E p (L,P)) = P. In addition with the correctness condition, the inter- 
mediate inputs and outputs for both encryption and decryption are simply 
reordered. In Example 2 (given above), we have a = b = f = 2, £ = 3. For 
the decryption algorithm, we execute the function in the reverse order and 
so we set = 3 , /?2 = 2,/?i = 3. So the key- assignment function for the 
decryption is 7 = 2, — 2 ,tv ' 3 = 1. We do not need to apply inverse for 
the decryption (it is called inverse-free) and so inverse- assignment tuple is l 3 . 
So if (ui,vi),(u2,V2) and (us,vs) are the intermediate input-output pairs for 
encryption then (i^, V3), (U2 , V2) and (ui,vi) (reordering of the previous pairs) 
are the intermediate input-output pairs for decryption. 

Examples. EME [16], ELmE [11], AEZ [1], CMC [15] (these follow Encrypt- 
Mix-Encrypt paradigm), Luby-Rackoff with a = b = 2, unbalanced Feistel [17, 
48] etc. are some examples of length-preserving linear mode encryptions. HCBC1, 
HCBC2 [3], Modified-HCBC’s, ELmD [1], MCBC [26], COPE [2] etc. are some 
examples of online computable length-preserving encryptions based on linear 
mode. 


4 PRP and SPRP Distinguishing Attacks 

Consider a length-preserving encryption scheme based on (fc+a, a, £) linear mode 
E. Now we show two main results in this section. Namely, we provide PRP 
and SPRP distinguishing attacks on the encryption scheme if £ < 2a — 2. and 
i < 2a — 1 respectively. Thus, it gives lower bound on the number of invocations 
of building blocks for achieving PRP and SPRP security. 

4.1 PRP Distinguishing Attack on E with £ = 2a — 2 

Let us assume i = 2a — 2. The attack can be trivially extended to all those 
constructions with £ < 2a — 2. We recall that E£(P) = C if and only if 



Distinguisher D prp against (fc + a, a, 2 a — 2)-Linear mode E. 

1. step-1 (finding a suitable difference in a pair of plaintext queries): Let d E 
be the non-zero solution of solve(F[..a — 1 ; k + l..k + a] , 0) , i.e. E[..a — 1 ; 
fc + l..fc + a]-d = 0. Such a non-zero solution exists as the number of columns 
is more than that of rows (see lemma 1). 
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2. step-2 (make the queries with the difference obtained in step-1): Now the 
distinguisher makes two queries 0 a and d and obtains corresponding responses 
c = E{( 0) and c' = E p L (d). Let 

^1 j • • • j ^2a—2i '^2a—2') Und rq, , • • • 5 ^2a—2') ^2a—2 

denote the intermediate inputs outputs for the two queries respectively. By 
lemma 2, we have 1 < i < a — 1, iq = u'^ Vi = v[ and 

Ac = E[2a — 1.. ; k + l..(a + fc)] • d + l£[2a — 1.. ; 2a + fc..] • Z\u a .. 

while it is interacting with the keyed construction. 

3. step-3 (find a nullifier of unknown intermediate values): As the matrix 
E[2a — 1.. ; 2a + k..] is ax (a — 1) matrix, we find a non-zero binary vec- 
tor w G {0, l} na such that w • E[2a — 1.. ; 2 a + k..] = 0. In particular, 
w = solve(L?[2a — 1 .. ; 2 a + k..] tr , 0). 

4. step-4 (the distinguisher event): If w • Ac = w - E[2a — 1.. ; k + l..(a + fc)] • d 
then it returns 1 (decision for the keyed construction), else returns 0 (decision 
for uniform random permutation). 

The distinguishing advantage of the above distinguisher D is at least 1/2 
since for a random permutation w • Ac = w • E[2a — 1.. ; k + l..(a + k)\ • d with 
probability 1/2 whereas we have seen this holds with probability one for the 
keyed construction. When a = 2, we know that LR with three rounds is PRP. 
This shows the bound is tight at least for a = 2. 

A Generalized Distinguisher H/// Against (k + a, a, ^)-Linear Mode E. 

Now we define a distinguisher against (fc+a, a, ^)-linear mode E assuming certain 
singularities in the sub- matrices. 

Assumption: Suppose there exists an integer t such that 

1. rank(£ , [..t ; ..a]) < na and 

2. rank(.E/[£ T 1.. 5 a T k T t T 1..]) Tia. 

Note the above assumption always holds for t = a — 1 when £ < 2a — 2. 
However, if £ > 2a — 1, the both conditions not necessarily hold. Whenever 
the assumptions hold, we have the following similar distinguisher as mentioned 
before. This distinguisher would be used later on while describing SPRP distin- 
guisher s. 

Distinguisher D-j™ Against (k + a, a, ^)-linear Mode E. 

1. step-1. Due to the assumptions, we can find d and w such that E[..t ; ..a]) • 
d = 0 and re • E[£ + 1.. ; a + k + t + 1..] = 0. 

2. step- 2. Then we make two queries 0 and d and obtain responses c and d . 

3. step-3. The distinguisher returns 1 if w- Ac = w-E[£+ 1.. ; fc + l..(a + fc)] -d, 
else 0. 
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4.2 SPRP Distinguishing Attack on E with £ = 2a — 1 

Now we show that if i < 2a then we have a SPRP distinguishes In other words, 
2 a many invocations is minimum to achieve SPRP and which is tight as it is 
achieved in CMC. The basic intuition of our attack is similar to that of XLS. 
However, to complete the attack for any linear- mode encryption we need to 
carefully set the queries and distinguishing event. Consider a length-preserving 
(fc, a, 2 a — l)-encryption scheme based on (fc + a, a, 2 a — l)-linear mode E. Let 
us denote the (fc + a, a, 2 a — l)-linear mode for its decryption by D. We describe 
three distinguishes depending on cases. 


Case 1: Rank(l£[2a.. ; 2a + fc..]) < na. In this case, the two assumptions, 
mentioned above, hold for t = a — 1. So we can run the PRP-distinguisher D^. 

Case 2: Rank(D[..a ;fc + l..fc + a])< na. In this case, the two assump- 
tions also hold for t = a for the decryption function. So we run our general PRP 
distinguisher applied to the decryption function. 

Case 3: Rank(D[..a ; k + l..k + a]) = na , rank(i£[2a.. ; 2a + k..]) = na . 

Here we describe a SPRP distinguisher. Briefly, it works as follows. It first 
makes two queries as in step- 2 (the first a — 1 intermediate input and outputs 
are identical for two encryption queries). Using the invertible property we can 
actually obtain all the differences of intermediate values. As the computation of 
decryption algorithm must use same internal input and outputs of the building 
blocks, we also know the differences of intermediate inputs and outputs if we 
decrypt the first two encryption queries. Now we find another decryption query 
for which the first a intermediate input and output differences with one of the 
first two queries are fixed. So we can nullify the unknown a — 1 differences and 
obtain a distinguishing event. The details are described below. 

Distinguisher D sprp Against (fc + a, a, 2 a — 1)-Linear Mode E. 

1. step-1 (make two queries with a certain difference, same as PRP distin- 
guisher): Let d G be the non-zero solution of solve(£ 1 [..a — 1 ; fc+l..fc+a], 0), 
i.e. E[..a — 1 ; fc + l..fc + a] • d = 0. It makes two queries 0 a and d and obtains 
corresponding responses c = E P L { 0) and d = E p L (d). 

Let iq, tq 5 . . . , U 2 a-i, V2a-i an d u ' x , v [ , . . . , u' 2a _i , v' 2a -\ denote the interme- 
diate inputs outputs for the two queries respectively. By lemm3, we have 
1 < i < a — 1, Ui = ?4, Vi = v[ and 

Ac = E[2a — 1.. 5 k T l..(a T fc)] • d T E\2a.. 5 2 a T fc..] • Ac a _ 

while it is interacting with the keyed construction. 

2. step-2 (solve for Au, Av): Using the invertible property of E[2a.. ; 2a + fc..], 
we can actually solve Av amm and hence Au a _. Thus, we know Au and Av. Sup- 
pose we make two (redundant) decryption queries c and d (whose responses 
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must be 0 and d) and let z \, . . . , W2 a -u Z2a-i and w 'i, z i, • • • , w 2a-ii z 2a-i 
denote the intermediate inputs outputs for the two queries respectively. Then 
by the definition of decryption algorithm we also know Aw, Az which are 
nothing but (/?, 7r)-reordering of (Au,Av). 

3. step-3 (find a difference for the final decryption query): Now we find a dif- 
ference d! such that 

D[.. a ; k + l..k + a + 1] • = (^l) • 

We can solve for a non-zero d ! . This can be solved assuming that Aw\ ^ 0 (see 
the remark below). Note that the matrix D[..a ; fc + l..fc + a] is invertible. Now 
we make two decryption queries c and c! = c + d ! . While we set two queries 
we should ensure that none of these have been obtained in the first two 
encryption queries (these are also called non-pointless or non-trivial queries). 
Let w\,z\,..., W 2 a-h Z2a-i z [, . . . , ^2a-i> ^2a-i denote the intermediate 
inputs outputs for these two queries respectively and let p and p' denote 
the corresponding responses. By choice of d! we know that z\ = z[ and 
A-2..„ =()■' T . 

4. step- 4 (find a nullifier of unknown intermediate values, same as PRP distin- 
guisher): As Z}[2a.. ; 2a + k..] is a x (a — 1) matrix, we find a non-zero binary 
vector w G {0, l} nb such that w • D[2a — 1.., 2 a + k..} = 0. 

5. step-5 (the distinguisher event): If w-(p®p f ) = w-D[2a — l.. ; fc + l..(a+fc)]-d' 
then it returns 1 (decision for the keyed construction), else returns 0 (decision 
for uniform random permutation). 

Remark 1. In the above attack we assume that Aw\ ^ 0 since otherwise we 
do not get a non-zero d ! . Note that Aw\ can be written as a function of c and 
d . So for a random permutation, a function of c and c' become zero has low 
probability. So we may assume that the Aw\ ^ 0. 

5 Security Analysis of Inverse-Free Single Key 
Construction 

5.1 PRP Attack of Single-Key Inverse- Free Constructions 
Without Masking 

In the last section, we have seen that to obtain PRP, we need at least 2a — 1 
invocations and this is tight as three rounds of LR achieves this bound. 
Note that the three calls of the building block can not have same key. In [28], 
it is also shown that three rounds of LR-type rounds with same key building 
block can not be PRP. However, their result is applicable to a specific form of 
encryption schemes. Now, we generalize this result and show that any inverse- 
free single function- key (and no masking key) PRP requires at least 2 a calls. 
In [28], there is a construction of inverse-free SPRP over two blocks invoking 
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underlying function (single keyed) four times. So the bound is tight. Interest- 
ingly, the cost of PRP and SPRP become same when we want inverse-free single 
function-key constructions. 

Consider a length-preserving encryption scheme based on (a, a, 2 a — l)-linear 
mode E. Let us denote the (a, a, 2a— l)-linear mode for its decryption by D. Since 
it is inverse-free the inverse- assignment for the decryption is (3 = (1, 1, . . . , 1). 
As it is based on single function-key, the key- assignment is a constant function, 
i.e., 7 Ti = 7r' = 1. However, there exists a permutation (3 over [1..2a — 1]. such 
that w and £ are 7r-reor dering of u and v respectively where u , v denote the 
intermediate input and output, respectively for E P (P ) = C and similarly w, z for 
D P (C) = P. We first briefly describe how we can construct a PRP-distinguisher 
(as like SPRP). The attack is similar to SPRP but we can not make decryption 
queries. We see how we can manage even if we are not allowed to make decryption 
queries. 

We make two encryption queries such that A a -\u = A a -\v = 0 a_1 . This is 
possible as we have a many plaintext blocks. Assuming some invertible property, 
we can find out the whole differences Au and Av for these two queries. For these 
two queries, if we look at the decryption computation then the first inputs, say 
wi,w[ and their corresponding output differences Azi (not the exact outputs) 
for both decryption are known (as there is no masking key). So now we make 
two encryption queries with the the following restrictions on intermediate values 
F, v, v! and v': u\ = wi,u[ = w[, A 2 .. a u = A 2 .. a u', A 2 .. a v = A 2 .. a v' . As we have 
obtained differences for the first a inputs in a determined manner, we can nullify 
the remaining a — 1 intermediate differences and obtain a distinguishing event. 
The more details of the attack is given below depending on different cases. Note 
that the matrix E G M n (3a — 1, 3a — 1). 

Distinguisher D prp Against (a, a, 2a — l)-Linear-Mode E 
(with Corresponding Decryption Mode D. 

Case 1: Rank(l£[2a.. ; 2a..]) < na. In this case, the two assumptions, men- 
tioned before, hold for t = a — 1. So we have our general PRP distinguisher. 

Case 2: Rank(l£[l..a ; ..a]) < na. In this case, the two assumptions also hold 
for t = a. So we have our general PRP distinguisher. 

Case 3: Rank(i£[l..a ; ..a]) = na., rank(FJ[2a.. ; 2a..]) = na. Here we 
describe a PRP distinguisher which works similar to SPRP distinguisher and as 
described above. 


1. step-1 (make two queries with a certain difference, same as PRP distin- 
guisher): Let d G I® be the non-zero solution of solve(£’[..a — 1 ; ..a],0), i.e. 
E[..a — 1 ; ..a ] -d = 0. It makes two queries 0 a and d and obtains corresponding 
responses c = E p { 0) and d = E p (d). 
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Let ui,vi, . . . ,U2a-i,V2a-i and u [ , v[ , . . . , u' 2a _i , ?4a-l denote the inter- 
mediate inputs outputs for the two queries respectively. By lemma 3, we have 
l < i < a — l. Ui = u'i, vi = v\ and 

Ac = E[2a.. ; ..a] • d + E[2a.. ; 2a..] • Av a „ 


while it is interacting with the keyed construction. 

2. step-2 (solve for Au, Av): Using the invertible property of E[2a.. ; 2a..], we 
can actually solve Av a .. and hence Au a ..- Thus, we know Au and Av. Now 
note that the first input of decryption D is only based on c and d. Let / 3 be the 
permutation corresponding to the reordering of intermediate input outputs 
for decryption. So the values of up ± and u ^ are known (as they depend only 
on c and d due to no masking keys and inverse-free property). Moreover, we 
know Avp t . Here we assume the difference Au p 1 is non-zero, otherwise, we 
can have a different distinguishing event as zero difference can occur with low 
probability for random permutation. 

3. step-3 (find a difference for two more encryption queries): Now we find a 
solution p and p' such that 


( E[l,*} 0 \ 

0 E[ 1,*] 

\E[2..a, *] E[2..a, *]/ 




This can be solved as it has full column rank (see Lemma 2). Now we make 
two encryption queries p and p' and obtain outputs c and c' . Let u,v,u' and 
v' be the intermediate inputs and outputs for these two queries respectively. 
So u\ = up 1 , u[ — v!p v Av i = Avp x and A 2 ..qU = A 2 .. a v = 0 a_1 . Thus, 
the a block output difference Ac depends only on the a — 1 blocks of the 
intermediate output difference Av a + 1 ..- 

4. step- 4 (find a nullifier of unknown intermediate values, same as PRP distin- 
guisher): As E[2a.. ; 2a + 1..] is a x (a — 1) matrix, we find a non-zero binary 
vector w G {0, l} nb such that w • E[2a.., 2 a + 1..] = 0. 

5. step-5 (the distinguisher event): If w • (p ® d) = w • D[2a.. ; ..a] • d' then 
it returns 1 (decision for the keyed construction), else returns 0 (decision for 
uniform random permutation). 


5.2 PRP Security of Single-Key Luby-Rackoff with Masking 

Define one round Luby-Rackoff LR^(a,6) = (b ® /(a), a) where a, b G I n and 
/ G Func(a,a). In [28] it was shown that three rounds of some variants LR 
rounds with single function key is not PRP secure. In last section we have also 
generalized and showed that any encryption making three calls over two blocks 
input with key space /C = T = Func(a) is not PRP secure. However, we now 
show that a simple variant of LR with a masking key becomes PRP secure. 

Definition 6. For any f G Func(a), L G I n , we define (see the Fig. 5 below) 

LR f p{a, b) = LR f ( LR? ( LR f (a + L, b ))). 
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a b 



d 


Fig. 5. LR-three rounds single function- key and one masking key. 


Now we show that the above construction with key-space JC = I n x Func is 
PRP. Note that we have constant key- assignment (i.e., we reuse the PRF for all 
invocations) and also inverse assignment tuple is l 3 . Let / denote the uniform 
random function on I n . Given a tuple of elements c = (ci, . . . , c t ) we say that 
the event col 1(c) holds if there exists i ^ j such that c* = Cj. We define 

Vbad = {((ffll, bi,Ci,di), . . . (a q , bq,Cq, dq)) € I* q : coll(c)}. 

It is easy to see that for random function F^ and a g-algorithm A, 

P r[r(/ ! )eV w ] < (f) 2 ~ n - 

Now we show the high interpolation probability of the variant of 3 round LR 
construction. 

Proposition 1. For all r = ((ai, bi, ci, di), . . . (a q , b q , c q , d q )) 0 Vbad, we have 
Pr[r] := Pr[ LR f £ 3 (a*, bi) = (c*,d*), 1 < i < q] > (1 - e)2~ 2nq 
where e = • 

Proof. We say that a tuple (L 0 , (^)i<i< 9 ) is admissible if 

1. L 0 & {ai + Cj] 1 < i, j < q} U {a* + Xj] 1 <i,j < q}, 

2. xds are distinct and Xi ^ Cj, 1 < i, j < q and 

3. whenever = a^, we have xi + Xj = bi + bj. 
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Let A denote the set of admissible tuples. Let qi be the number of distinct 
di s. The number of (Lo,x = (aq, . . . ,x q )), denoted Ah, 3, satisfying only ( 1 ) and 
( 3 ) is at least (2 n — 2 q 2 ) x 2 nqi . So the number of admissible tuple is at least 

(2" - 2 q 2 ) x 2 nqi - (2" - 2 q 2 ) x 2 n ^ 1 ~ 1 hq 2 /2. 

We mainly subtract the number of tuples satisfying (1) and ( 3 ) and not satisfying 
(2) from iVi^. So the number of admissible tuple is at least 2 n(</1 + l * (1 — e) where 
e- 

t ~ 2 ”+ 1 ' 

Now, for any r = ((a x , bi, c 1} d{), . . . (a q ,b q ,c q ,d q )) g V ba d we have 
Pr[r] > Y Pr[T,Xi = Xi ,L = L 0 \= Y 2 ~ n ^ +2q+1) . 

(Lo,x)eA ( L 0 ,x)eA 

By using the lower bound of the number of admissible tuples we have 

Pr[LR{’ 3 (a*, h) = (c*, di), 1 < i < q\ > (1 - ^ I )2~ 2nq . 

z □ 

Theorem 2. For any q- adversary, the PRP advantage Adv^ 3 against LR £’ 3 
is at most 

Proof. Armed with the above result and using Coefficient-H technique the the- 
orem follows. □ 

6 Conclusion 

In this paper, we justify formally why we do not have any length-preserving 
PRP constructions more efficient than LR three rounds and length-preserving 
SPRP constructions more efficient than CMC or four round LR (in terms of the 
number of building block calls). We note that this optimality holds for all linear 
modes. We show that any such linear mode based constructions over £ blocks 
requires at leat 2^—1 blockcipher calls against chosen plaintext adversaries and 
at least 2 £ blockcipher calls against chosen plaintext-ciphertext adversaries. This 
bounds are clearly tight as we know some constructions achieving the bound. 
Then we look into inverse- free single- key PRP constructions. Nandi has shown 
that three blockcipher call is no longer sufficient for LR-type constructions over 
two blocks (note that three call is sufficient using two independent PRF). We 
extend this result and show that any Ablock single-key inverse-free PRP must 
require 2 £ calls like SPRP constructions. However, if we are allowed to use one 
masking key then we can have inverse-free PRP construction invoking only three 
blockcipher calls. We actually show that the three round LR using same keyed 
PRF is PRP if we mask a plaintext block by a masking key. 
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Abstract. The iterated Even-Mansour construction defines a block 
cipher from a tuple of public n-bit permutations (Pi, . . . ,P r ) by alter- 
natively xoring some n-bit round key fc, i = 0, ...,r, and applying 
permutation Pi to the state. The tweakable Even-Mansour construction 
generalizes the conventional Even-Mansour construction by replacing the 
n-bit round keys by n-bit strings derived from a master key and a tweak , 
thereby defining a tweakable block cipher. Constructions of this type 
have been previously analyzed, but they were either secure only up to 
the birthday bound, or they used a nonlinear mixing function of the 
key and the tweak (typically, multiplication of the key and the tweak 
seen as elements of some finite held) which might be costly to imple- 
ment. In this paper, we tackle the question of whether it is possible to 
achieve beyond-birthday-bound security for such a construction by using 
only linear operations for mixing the key and the tweak into the state. 
We answer positively, describing a 4-round construction with a 2n-bit 
master key and an n-bit tweak which is provably secure in the Random 
Permutation Model up to roughly 2 2n//3 adversarial queries. 


Keywords: Tweakable block cipher • Iterated Even-Mansour cipher • 
Key-alternating cipher • Beyond-birthday-bound security 


1 Introduction 

Background. A block cipher with key space 1C and message space Ad is a family 
of permutations of M indexed by the key k G JC. A tweakable block cipher 
(TBC) takes an additional (potentially public) input parameter t G T called a 
tweak aiming at providing inherent variability in about the same way an IV or 
nonce brings variability to an encryption scheme. Some block ciphers such as the 
Hasty Pudding Cipher [35], Mercy [10], or Threefish (the block cipher underlying 
the Skein hash function [15]) were designed so as to natively support tweaks. 
The syntax and security requirements for tweakable block ciphers were formally 
articulated in a seminal paper by Liskov, Rivest and Wagner [24]. Since then, 
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TBCs have found multiple applications such as (tweakable) length-preserving 
encryption modes [18,19], online ciphers [1,33], and authenticated encryption 
modes [24,31,32]. 

Liskov et al. [24] also proposed two generic constructions of a TBC from a 
standard block cipher, achieving security up to the so-called birthday bound, 
i.e., when the adversary is allowed at most roughly 2 n / 2 queries to the encryp- 
tion or decryption oracle, where n is the block size (that is, the message space 
of the TBC is A4 = {0,l} n ). The “black-box” design strategy (i.e., building a 
TBC on top of an existing standard block cipher, in a black-box way) has since 
then been the main avenue of research. Earlier proposals, such as XEX [31] and 
variants [4, 26] were related to the second of the two original proposals of Liskov 
et a/., and were limited to birthday-bound security as well. Recently, a number 
of constructions achieving beyond-birthday-bound security have emerged, such 
as Minematsu’s construction [27], the CLRW construction [22,23,30], and two 
constructions by Mennink [25]. All those constructions enjoy a security proof in 
the standard model (i.e., assuming that the underlying block cipher is a pseudo- 
random permutation), except for Mennink’s constructions that were analyzed in 
the ideal cipher model. 

Tweaking Even-Mansour Ciphers. Unfortunately, none of the currently 
known black-box TBC constructions with beyond-birthday-bound security can 
be deemed truly practical (even though some of them might come close to it [25]). 
Hence, it might be beneficial to “open the hood” and to study how to build 
a TBC from some lower level primitive than a full-fledged conventional block 
cipher, e.g., a pseudorandom function or a public permutation. For example, 
Goldenberg et al. [16] investigated how to include a tweak in Feistel ciphers. This 
was extended to generalized Feistel ciphers by Mitsuda and Iwata [28] . Recently, 
a similar study was undertaken for the second large class of block ciphers besides 
Feistel ciphers, namely key-alternating ciphers [11], a super-class of Substitution- 
Permutation Networks (SPNs). An r-round key-alternating cipher based on a 
tuple of public n-bit permutations (Pi, . . . , P r ) maps a plaintext x G {0, l} n to 
the ciphertext defined as 

y = k r 0 P r (k r _ i 0 P r _i(- • • P 2 (k\ 0 Pi{k 0 0 x )) •••)), (1) 

where the n-bit round keys fco, • • • , k r are either independent or derived from a 
master key k. When the Pi s are modeled as public permutation oracles, con- 
struction (1) is also referred to as the (iterated) Even-Mansour construction, 
in reference to Even and Mansour who pioneered the analysis of this construc- 
tion in the Random Permutation Model [13]. While Even and Mansour limited 
themselves to proving birthday-bound security in the case r = 1, larger num- 
bers of rounds were studied in subsequent works [3,21,36]. The general case has 
been recently (tightly) settled by Chen and Steinberger [6], who proved that 
the r-round iterated Even-Mansour cipher with r-wise independent round keys 
ensures security up to roughly 2 7 n adversarial queries. 

In order to incorporate a tweak t in the iterated Even-Mansour construction, 
it is tantalizing to generalize (1) by replacing round keys ki by some function 
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fi( k, t) of the master key k and the tweak t (see Fig. 1). We will refer to such a 
construction as a Tweakable Even-Mansour (TEM) construction. 1 This is exactly 
the spirit of the TWEAKEY framework introduced by Jean et al. [20]. In fact, 
these authors go one step further and propose to unify the key and tweak inputs 
into what they dub the tweakey. The main topic of this paper being provable 
security (in the traditional model where the key is secret and the tweak is chosen 
by the adversary), we will not make such a bold move here, since we are not 
aware of any formal security model adequately capturing what Jean et al. had 
in mind. 

The investigation of the theoretical soundness of this design strategy was 
initiated in three recent papers. First, Cogliati and Seurin [8], and independently 
Farshim and Procter [14], analyzed the simple case of an n-bit key k and an 
n-bit tweak t simply xored together at each round, i.e., fi(k,t) = k ® t for 
each i = 0, . . . , r? They gave attacks up to two rounds, and proved birthday- 
bound security for three rounds. In fact, the security of this construction caps 
at 2 n / 2 queries independently of the number of rounds. Indeed, it can be written 
E(k,t,x) = E(k 0 t,x), where E is the conventional iterated Even-Mansour 
cipher with the trivial key-schedule (i.e., the same round key is xored between 
each round), and by a result of Bellare and Kohno [2, Corollary 5.7], a tweakable 
block cipher of this form can never offer more than k/2 bits of security, where 
k is the key-length of E (i.e., k = n in the case at hand). Hence, if we want 
beyond-birthday-bound security, we have no choice but to consider more complex 
functions fi (at the bare minimum, these functions, even if linear, should prevent 
the TBC construction from being of the form E(k 0 £, x) for some block cipher 
E with n-bit keys). 

This was undertaken by Cogliati, Lampe, and Seurin [7], who considered 
nonlinear ways of mixing the key and the tweak. More specifically, they studied 
the case where /i(k,£) = H^ i (£), where the family of functions (Hk) is uniform 
and almost XOR-universal, and the master key is k = (fco, • • • , k r ). A classical 
example is multiplication-based hashing, i.e., /^(k, t) = ^ 0 t, where 0 denotes 
the multiplication in the finite field F 2 n , the tweak t = 0 being forbidden. Cogliati 
et al. showed that one round is secure up to the birthday bound, and that two 
rounds are secure up to roughly 2 2n / 3 adversarial queries. 3 They also provided a 

1 We warn that the naming Tweakable Even-Mansour construction was previously 
used by the designers of Minalpher [34], a candidate to the CAESAR competition, 
to designate a permutation-based variant of Rogaway’s XEX construction [31], i.e., a 
1-round Even-Mansour construction where the derivation functions /o and fi applied 
to (k, t) are allowed to depend on the internal permutation Pi (something we do not 
consider in this paper). 

2 Actually, the results of [8, 14] were stated in terms of xor-induced related-key security 
of the (conventional) iterated Even-Mansour cipher, but in this case this is equivalent 
to standard (i.e., single- key) security of the corresponding tweakable construction. 

3 More precisely, the birthday-bound result applies to the variant of the construction 
were the same key is used before and after permutation Pi, and the 2 2n//3 -security 
bound applies to the cascade of this construction with two independent keys and 
two independent permutations. 
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(non-tight) asymptotic security bound improving as the number of rounds grows. 
However, implementing a xor-universal hash function might be costly, and linear 
functions s would be highly preferable for obvious efficiency reasons. 

Our Results. In this paper, we ask whether it is possible to come with a 
tweakable Even-Mansour construction achieving both: 

1. a linear mixing of the tweak and the key to the state; 

2. beyond-birthday-bound security. 

We answer positively, by providing a construction with 2n-bit keys and n-bit 
tweaks. The starting point is the 4-round iterated Even-Mansour construction 
with a 2n-bit master key (fco,fci), fco and k\ being both n bits, and what we 
call the “alternating” key schedule, namely round keys are fco, fci, fco, etc. This 
is for example how LED-128 is designed [17]. To turn this block cipher into a 
tweakable Even-Mansour construction, we simply add the n-bit tweak t between 
each permutation (see Fig. 2). In other words, if we denote i£((fco, fci), x) the 
conventional Even-Mansour cipher with alternating round keys, the tweakable 
construction that we consider can be written 


E((k 0 , fci ),t, x) = E((k 0 ® t, fci 0 t),x). 

We prove that this construction is secure up to roughly 2 2n / 3 adversarial 
queries. Unsurprisingly, and as in many previous works, our proof uses Patarin’s 
H-coefficients technique [6,29]. In particular, we rely on a key lemma by Cogliati 
et al. [7] to analyze so-called good transcripts. 

Application to Related-Key Security. Our result can be rephrased in terms 
of related-key security [2] of the conventional Even-Mansour cipher: the 4-round 
conventional Even-Mansour cipher with the alternating key-schedule is secure 
up to roughly 2 2n / 3 adversarial queries against related-key attacks for the set of 
related- key deriving functions. 

<£ 2 -© d 4 f ^ (ko (B A,h ® A) : A e {0, l} n }. 

Note that this set is more restrictive than the set that would allow to xor an 
arbitrary 2n-bit string to the master key (fco, Ah). It remains an open problem 
(already stated in [8]) to find an Even-Mansour construction provably secure 
beyond the birthday bound against -related- key attacks. 

Open Problems. We propose three challenging open problems, the first two 
being restricted to the case of n-bit tweaks. First, what would be the analogue 
of the Chen-Steinberger result [6] in the tweakable setting? In more details, we 
know how to deliver n/2 bits of security with an n-bit master key [8,14] and 
this paper shows how to reach 2n/3 bits of security with a 2n-bit master key. 
Hence, it is natural to ask whether one can obtain rn/(r + 1) bits of security 
from an rn-bit master key for r > 2, and what would be the adequate num- 
ber of rounds and the corresponding (linear) “tweak- and- key” schedule. Second, 
Chen et al. [5] showed that the 2-round conventional Even-Mansour construc- 
tion can provably deliver 2n/3 bits of security even with an n-bit master key 
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(for example, when the two inner permutations are independent, the trivial key- 
schedule is sufficient). Again, what would be the analogue of this result in the 
tweakable setting? Can we design a TEM construction with an n-bit master key 
and an n-bit tweak delivering 2n/3 bits of security, or even more? Finally, it is 
natural to ask whether one can extend the construction of this paper to handle 
larger tweaks, in particular 2n-bit tweaks. We show in the full version of this 
paper [9] that the naive way of proceeding, namely adding alternatively to and 
ti, is insecure for four rounds. Hence, this seems to require at least five rounds. 

We also remark that attacks against the (conventional) iterated Even- 
Mansour cipher with the alternating key-schedule have been investigated by 
Dinur et al [12]. It would be interesting to study whether these attacks can be 
adapted (and potentially improved) in the tweakable setting. 

Organization. In Sect. 2, we introduce the notation, the security definitions, 
and give some background on the H-coefficients technique. Our main result is 
proved in Sect. 3. 

2 Preliminaries 

2.1 Notation and General Definitions 

General Notation. In all the following, we fix an integer n > 1 and denote 
N = 2 n . For integers 1 < b < a, we will write (a)& = a(a — 1) • • • (a — b + 1) and 
(a) o = 1 by convention. The set of all permutations of {0, l} n will be denoted 
P(n). 

Tweakable Block Ciphers. A tweakable block^cipher with key space /C, tweak 
space T, and message space At is a mapping E : /C x T x At — > Ad such that 
for any key k G JC and any tweak t G T , x i— » E(k,t,x) is a permutation of Ad. 
We denote TBC(/C, T, n) the set of all tweakable block ciphers with key space /C, 
tweak space T, and message space {0, l} n . A tweakable permutation with tweak 
space T and message space Ad is a mapping P : T x M ^ M such that for any 
tweak t G T, x i— ► P(£, x) is a permutation of Ad. We denote TP (T, n ) the set of 
all tweakable permutations with tweak space T and message space {0, l} n . 

Tweakable Even-Mansour Constructions. Fix integers n,r > 1. Let 1C and 

T be two sets, and let f = (/o, . . . , f r ) be a (r + l)-tuple of functions from JCxT 
to {0, l} n . The r-round tweakable Even-Mansour construction TEM[n, r, f] spec- 
ifies, from an r-tuple P = (Pi, . . . , P r ) of permutations of {0, l} n , a tweakable 
block cipher with key space /C, tweak space T, and message space {0, l} n , simply 
denoted TEM P in the following (parameters [n, r, f] will always be clear from the 
context) which maps a key k G /C, a tweak t G T, and a plaintext x G {0, l} n to 
the ciphertext defined as (see Fig. 1): 

TEM p (k, t, x) = / r (k,t) ®P r (/ r _i(k,t) ®P r _i(---Pi(/ 0 (k, t) ®x) •••))• 

We will denote TEM P the mapping taking as input (t,x) G T x {0, l} n and 
returning TEM p (k, t,x). 
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We will mostly be interested in the case where JC = ({0, l} n ) a and T = 
({0, l} n ) 6 for integers a, b > 1. In this setting, we will denote k = (fco, • • • , k a - i) 
and t = (to, . . . j tb-i), all kC s and tj : s being n-bit strings, or simply k = fc, resp. 
t = t when a = 1, resp. 6 = 1. When all fC s are linear over ({0, l} n ) a + 6 ? W e say 
that the construction has linear tweak and key mixing. 


(k,t) 


X 



y 


Fig. 1. The r-round tweakable Even-Mansour construction based on a tuple of public 
permutations (Pi, . . . , P r ). 


Previously Studied Constructions. Two types of TEM constructions have 
already been studied. In [8], Cogliati and Seurin considered the simplest case 
where a = b = 1 (n-bit keys and n-bit tweaks) and fi(k,t) = k 0 t for each 
i = 0, . . . , r. This construction has linear tweak and key mixing, and is secure 
up to 2 n / 2 adversarial queries starting from r = 3. (The results of [8] were for- 
mulated in terms of xor-induced related-key attacks against the conventional 
iterated Even-Mansour construction, but in this simple case the two security 
notions are in fact equivalent.) In [7], Cogliati, Lampe, and Seurin studied a 
large class of nonlinear mixing functions, in particular, for n-bit tweaks, finite 
field multiplication-based ones, i.e., f(k,t) = k 0 £, or more generally, for 
6n-bit tweaks, polynomial hashing-based functions, i.e., /(fc, (to, . . . , £&-i)) = 

£-=o k i+1 ®u. 


2.2 Security Definitions 

Fix some family of functions f = (/ 0 , . . . , f r ) from 1C x T to {0, l} n . To study 
the security of the construction TEM[n, r, f] in the Random Permutation Model, 
we consider a distinguisher V which interacts with r + 1 oracles that we denote 
generically (Po,Pi, . . . ,P r ), where syntactically Po is a tweakable permutation 
with tweak space T and message space {0, l} n , and Pi, . . . , P r are permutations 
of {0, l} n . The goal of V is to distinguish two “worlds”: the so-called real world , 
where V interacts with (TEM^ , P), where P = (Pi, . . . , P r ) is a tuple of public 
random permutations and the key k is drawn uniformly at random from /C, and 
the so-called ideal world (Pq,P), where Po is a uniformly random tweakable 
permutation and P is a tuple of random permutations of {0, l} n independent 
from Po. We will refer to Po as the construction oracle and to Pi, . . . , P r as the 
inner permutation oracles. 
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The distinguishing advantage of a distinguisher V is defined as 

- Pr [: V p °’ P = 1 , 

where the first probability is taken over the random choice of k and P, and 
the second probability is taken over the random choice of Po and P. In all 
the following, we consider computationally unbounded distinguishers, and hence 
we can assume wlog that they are deterministic. We also assume that they 
never make pointless queries (i.e., queries whose answers can be unambiguously 
deduced from previous answers) . The distinguisher is allowed to query all oracles 
adaptively in both directions; this corresponds to adaptive chosen-plaintext and 
ciphertext attacks (CCA). 

For non- negative integers q c and q p , we define the insecurity of the 
TEM [n, r, f] construction against CCA-attacks as 

Adv TEM[»,r,f](?c?p) = maxAdv(D), 

where the maximum is taken over all distinguishers making exactly q c queries to 
the construction oracle and exactly q p queries to each inner permutation oracle. 

2.3 The H- Coefficients Technique 

As in many previous works [5-8], our security proof will use the H-coefficients 
technique [29], which we explain here. 

Transcript. Recall that the distinguisher V interacts with a tuple of r+1 oracles 
denoted (Po, Pi, . . . , P r ). In the real world, the construction oracle Po is TEM^ 
where P = (Pi, . . . , P r ) and k is random, whereas in the ideal world it is a ran- 
dom tweakable permutation independent from (Pi, . . . , P r ). From the interaction 
of V with these oracles, we define the queries transcript (Qc, Qph - - - , Qp r ) of 
the attack as follows. The list Qc records the queries to the construction oracle: 
if V made either a direct query (t,x) to the construction oracle Po which was 
answered by y , or an inverse query (t , ?/) which was answered by x, then the 
triple (t ,x,?/) E T x { 0, l} n x {0, l} n is added to Qc- Similarly, for 1 < i < r, 
Qp i contains all pairs (u, v ) E {0, l} n x {0, l} n such that V made either a direct 
query u to permutation Pi which was answered by v, or an inverse query v 
which was answered by u. Note that queries are recorded in a directionless and 
unordered way, but by our assumption that the distinguisher is deterministic, 
the raw interaction of V with its oracles can unambiguously be reconstructed 
from the queries transcript (see e.g. [6] for more details). Note also that by our 
assumption that V never makes pointless queries, each query to the construc- 
tion oracle results in a distinct triple in Qc, and each query to Pi results in a 
distinct pair in Qp.. Moreover, since we assume that the distinguisher always 
makes the maximal number of allowed queries to each oracle, one has | Qc | = q c 
and |Qp. | = q p for 1 < i < r. In all the following, we also denote m the number 
of distinct tweaks appearing in Qc, and ^ the number of queries for the i-th 
tweak, 1 < i < m, ordering the tweaks arbitrarily. Note that one always has 


Adv(D) = f 


Pr 


£>tem£,p = x 
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YT = i Qi = Qc, even though m may depend on the answers received from the 
oracles. 

A queries transcript is said attainable (with respect to some fixed distin- 
guisher V) if there exists oracles (Pq,P) such that the interaction of V with 
(P 0 ,P) results in this transcript (in other words, the probability to obtain this 
transcript in the ideal world is non-zero). Moreover, in order to have a simple 
definition of bad transcripts, the actual key k is revealed to the adversary at 
the end of the experiment if we are in the real world, while in the ideal world, 
a “dummy” key k JC is simply drawn uniformly at random independently 
from the answers of the oracle Pq (this is obviously without loss of generality 
since this can only help the distinguisher and increase its advantage). All in all, 
a transcript r is a tuple r = (Qc, Qp 1 , • • • , Qp r , k), and we say that a tran- 
script is attainable if the corresponding queries transcript ( Qc , Qp 15 . . . , Qp r ) is 
attainable. We denote 0 the set of attainable transcripts. In all the following, 
we denote T re , resp. XJd, the probability distribution of the transcript r induced 
by the real world, resp. the ideal world (note that these two probability distrib- 
utions depend on the distinguisher). By extension, we use the same notation to 
denote a random variable distributed according to each distribution. The main 
lemma of the H-coefficients technique is the following one (see e.g. [5,6] for the 
proof). 


Lemma 1. Fix a distinguisher V . Let 0 = © g0 odU©bad be a partition of the set 
of attainable transcripts. Assume that there exists £i such that for any r G 0 goo d, 
one has 4 


Pr[T re = r] 
Pr[T id = r] 


> 1 - M, 


and that there exists £2 such that Pr\T ld G ©bad] < £2- Then Adv(P) < £1 +£ 2 . 


Useful Observations. We end this section with some useful preliminary obser- 
vations. First, we introduce some additional notation. Given a permutation 
queries transcript Q and a permutation P, we say that P extends Q, denoted 
P b Q, if P{u) = v for all (u,v) G Q. By extension, given a tuple of permu- 
tation queries transcripts Qp = (Qp 1 , . . . , Qp r ) and a tuple of permutations 
P = (Pi, . . . , P r ), we say that P extends Qp, denoted P b Qp, if P* b Qp i for 
each i = 1, , r. Note that for a permutation transcript of size q p , one has 

Pr[P^ $ P(n):PhQ] = ^-. (2) 


Similarly, given a tweakable permutation transcript Q and a tweakable permu- 
tation P, we say that P extends Q, denoted P b Q, if P(t, x) = y for all 
(£, x,y) G Q. For a tweakable permutation transcript Q with m distinct tweaks 
and qi queries corresponding to the i-th tweak, one has 


Pr[P 


TP(T,n):PhQ] = F 




(3) 


Recall that for an attainable transcript, one has Prpld = r] > 0. 
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_ It is easy to see that the interaction of a distinguisher V with oracles 
(Po, Pi , • • • , P r ) yields any attainable queries transcript (Qc, Qp) with Qp = 
(Qpi, . . . , Qp r ) iff P^ Qc and Pi b Qp. for 1 < i < r. In the ideal world, the 
key k, the permutations Pi, . . . ,P r , and the tweakable permutation Pq are all 
uniformly random and independent, so that, by (2) and (3), the probability of 
getting any attainable transcript r = (Qc, Qp, k) in the ideal world is 


p t[T ld = r] = ^ x 



l 


In the real world, the probability to obtain r is 


Pr[T " = T] = W\ X 


1 




x Pr 


-$ (P (n)) r : TEM£ b Q c 


PP Qp 


Let 


p(r) d =Pr P 


(PW) r 


TEM? b Q c 


P b Qp 


Then we have 


Pr[T rc = r] 
Pr[T id = r] 



(4) 


Hence, applying Lemma 1 will require three steps: first, define good and bad 
transcripts, then upper bound the probability of bad transcripts in the ideal 
world, and finally lower bound the real world probability p(r) when r is good in 
order to use Eq. (4). 


2.4 An Extended Sum-Capture Lemma 

To upper bound the probability of getting a bad transcript in the ideal world, 
we will need a generalization of the sum-capture theorem from [5] (that applied 
to a random permutation) to the case of a family of random permutations 
(in other words, a random tweakable permutation). 

We denote GL (n) the general linear group of degree n over F 2 , i.e., the set of 
all automorphisms (linear bijective mappings) of F^. 

Lemma 2. Fix an automorphism T E GL(n) and a non-empty set T. Let P 
be a uniformly random tweakable permutation in TP(T, n), and let A be some 
probabilistic algorithm making exactly q (two-sided) adaptive queries to P. Let 
Q = ((ti, xi, 7/1 ), . . . , (t q , x q , y q )) denote the transcript of the interaction of A 
with P. For any two subsets U and V of {0, l} n , let 

v(Q,U,V) = I {((t,x,y),u,v) e QxU xV : x®u = r(y®v)}\. 


Then , assuming 9 n < q < N/2, one has 


Pr 


P,UJ 


3U,V C {0,1}" 


KQ,u,v)> 


QP\ \V\ 2fy/\UW\ 
N N 


+ 3y/nq\U\\V\ 
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where the probability is taken over the random choice of P and the random coins 
u; of A. 

The proof of this lemma is a simple generalization of the one from [5] and can 
be found in the full version of this paper [9]. 

3 Beyond-Birthday-Bound Security 

3.1 Statement of the Result and Discussion 

In this section, we consider the 4-round tweakable Even-Mansour construction 
TEM[n, 4, f] with 2n-bit keys and n-bit tweaks depicted on Fig. 2. The main 
result of this paper is the following one: 

Theorem 1. Let f = (/o,...,/*) where /i((fc 0 , fci), t) = fc imo d2 0 t- Let q c ,q p 
be two integers such that 9 n < q c and q p + 3 q c + 1 < N/2. Then one has 

\ _j cca / \ . 44 qj 2 + 3 8q c ^/qf + (30 + 3 y/n)q py /cfc + 4 qj 2 + 2 

Adv TEM[nAf] (g c ,g p ) < y • 

Hence, this construction ensures CCA-security as long as q c and q p are small 
compared to 2 2n / 3 , up to logarithmic terms in N = 2 n . 

The proof follows the H-coefficients method exposed in Sect. 2.3. In Sect. 3.2, 
we begin by describing the set of bad transcripts and upper bound the proba- 
bility to get such a transcript in the ideal world. Then, for any good attainable 
transcript r, we prove in Sect. 3.3 that the ratio between the probability to get 
r in the real world and in the ideal world is close enough to 1. 



Fig. 2. The 4-round tweakable Even-Mansour construction with a 2n-bit key (ko,ki) 
and an n-bit tweak t. 


3.2 Definition and Probability of Bad Transcripts 

The first step is to define the set of bad transcripts. Let r = (Qc, Qp i? • • • , 
Qp 4 , (Jfeo,Jfei)) be an attainable transcript, with |Qc| = q c and | Qp. | = q p for 
i = 1, . . . , 4. In all the following, we let, for i E {1, . . . , 4}, 


Ui = { Ui e {0,1}" : (ui,Vi) £ Q Pi } 

Vi = { Vi £ {0, 1}” : (ui,Vi) £ QpJ 
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denote the domains and ranges of Qp. respectively. We also define three quan- 
tities characterizing the transcript, 

d e f 

Oil = \{((t,x,y),ui) G Qc X Ui : x®k 0 ®t = ui}\ 

dcf 

a 4 = | {((t,x,y),v 4 ) E Qc x V 4 : y®k 0 ®t = v 4 }\ 

def 

«2,3 = \{((t,x,y),v 2 ,u 3 ) G Q c xV 2 xU 3 : v 2 ® k 0 © t = u 3 }|. 

We also define two quantities depending respectively on Qp 2 and Qp 3 : 

vi = f \{({u 2 ,v 2 ),(u' 2 ,v 2 )) G (Qp 2 ) 2 : [u 2 ,v 2 ) ^ (u 2 ,v 2 ), u 2 ®v 2 = u' 2 @v' 2 }\ 

V3 = f |{((« 3 ,U 3 ),(U 3 , 4 )) G (Qp 3 ) 2 : (u 3 ,v 3 ) + (u' 3 ,v 3 ), u 3 ®v 3 = u' 3 ®v' 3 }\. 

Definition 1 . We say that a transcript r is bad if at least one of the following 
conditions is fulfilled: 

(B-l) there exists (t,x,y) G Qc, (u\,vf) G Qp 1 , and G Qp A such that 

ko 0 t = x ® u\ = tq 0 y; 

(B-2) there exists (t,x,y) G Qc, (^ 1 ,^ 1 ) G Qp i; and (u 2 ,v 2 ) G Qp 2 such that 
ko ® t = x ® u\ and Aq 0 t = ® u 2 ; 

(B-3) there exists (t,x,y) G Qc, (^3,^3) G Qp 3 , and G Qp 4 snc/i that 

ki 0 t = vs ® U 4 and ko ® t = V 4 ® y; 

(B-4) «i > vV^/2; 

(B-5) a 4 > y/qf/2; 

(B-6) a 2 ,3 > q P y/cT; 

(B-7) u 2 > cr P ; 

(B-8) v 3 > ^ 

Otherwise we say that r is good,. 5 We denote <9 goo d ,■ resp. ©bad the set of good, 
resp. bad transcripts. 

We start by upper bounding the probability of getting bad transcripts in the 
ideal world. 

Lemma 3. Assume that 9 n < q c < N/2 and q p < N/ 2 . Then one has 


(5 + 3^/n) ^/qfq p + 4 qj + 2 
N 



Proof. We upper bound the probability of each condition in turn. We denote 
Oi the set of attainable transcripts satisfying condition (B-i). Recall that in the 
ideal world, the key (fco, kfi) is drawn independently from the queries transcript. 


5 We define conditions (B-4) and (B-5) using yfqfl 2 rather than ^fqf in order to be 
able later to directly apply a previous result by Cogliati et al. [7] . 
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Condition (B-l). Let BadKi be the set of keys fco such that there exists (t,x,y) G 
Qc, G Qp 1? and ( 14 , u 4 ) C Qp a such that fc 0 0 £ = x©iq =|/©u 4 . Note 

that BadKi only depends on the queries transcript, hence for any constant C we 
have, since fco is uniformly random, 


Pr [?i d € 0i] < Pr P 0 TP(T, n),P <—$ (P(n)) 4 : |BadKi| > C 


+ W (5) 


Moreover, if we let 


MQc V 1 V 4 ) = |{((i,x,y),Mi,u 4 ) e Qc xUi XV 4 : x®ui **y@v 4 )}\, 


then one clearly has 

| BadKi | < fi(Qc,Ui,Vi). 

Hence, we can use Lemma 2 in order to upper-bound | Bad Ki | with overwhelming 
probability (we consider V with access to the inner permutations as a proba- 
bilistic algorithm A interacting with the tweakable permutation P 0 , resulting in 
the transcript Qc , and we let P be the identity mapping). For 


r _ p , 


N 


+ 3 q v yjnqc 


we obtain that 


Pr 


Po 


Using (5) gives 


TP(T, n),P <— $ (P(n)) 4 : |BadK x | > C 



q c ql 


Pr[T id e0i]<^ 


N 2 


3q P Vmc 2 

N + N' 


Conditions (B-2) and (B-3). We consider (B-2). For each (t,x,y) G Qc , 
(iq,ui) G Qp 1? and (^2,^2) C Qp 2 , the probability, over the random draw of 
(fc 0 ,fci), that fc 0 © t = x © u\ and fci © t » Vi © U 2 is 1 /TV 2 since (fco, fci) is 
uniform and independent from the queries transcript. Summing over the q c q % 
possibilities for (£,#,?/), (1x1, iq), and (14, U2) yields 

Pr [T id G 6> 2 ] < Q A. 

Similarly, 

Pr[T id G0 3 ]<^. 

Conditions (B~4 ) and (B-5). We consider (B-4). Seeing aq as a random variable 
over the random draw of (fco, fci), one has 

E[aq] = ^2 Pr [&o = # © ^1 © t] < 

(' t,x,y)eQc ui€Ui 
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Then, using Markov’s inequality, 


Pr [T id € 0 4 ] = Pr 



^ 2E[aq] ^ ^Qp\/Qc 
- - N 


Similarly, 

Pr [T id G0 S ] < 2q ^. 

Condition (B-6). Again, we see a^, 3 as a random variable over the random draw 
of fco- Then 

2 

E[<*2,3] = Y E E Pr [ko = V2 ® U3 (B t] < -jjT-- 
(t,x,y)eQc V 2 ^V 2 u 3 eu 3 

Then, using Markov’s inequality, 


Pr [T id e Oq\ = Pr [a 2>3 > %\^c] < 


E[o^2,3] 
< bV^c 


^ Qpy/Qc 

~ N 


Conditions (B- 7 ) and (B-8). Consider (B-7). We see the distinguisher combined 
with P 0 and the inner permutations Pi, P3, and P4 as a probabilistic algorithm 
A interacting with P2, and we see as a random variable over the random 
choice of P2 and the randomness of A. One has 


E^] = Y Pr [ U 2,i ® V 2 ,i = U 2J ® V2,j] , 

(ij) 

1^*7 ^:Qc 


where the queries to P2 are ordered as they are issued by A. Consider the Pth 
and the j- th query, and assume wlog that i < j. If the j-th is a direct query 
U2j, then V2j is uniformly random in a set of size N — j + 1. Similarly, if this 
is a inverse query then U2j is uniformly random in a set of size N — j + 1. 
In all cases, the probability that U2p © V2 ,1 = U2j ® V2 j is at most 1 /(TV — q p ). 
Hence, 


E[i/ 2 ] < 


~ 1) 
N — q p 


< 


N ' 


Using Markov’s inequality, 


Pr [T id € 6> 7 ] = Pr [i/ 2 > y^] 


< 

- N 


Similarly, 

9 3/2 

Pr[T id G0 8 ]<^Y 

The result follows by a union bound over all cases. □ 
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3.3 Analysis of Good Transcripts 

In this section, we fix a good transcript r = (Qc, Qp x , . . . , Qp 4 , (&o, &i)). By (4), 
we have to lower bound 


p(r)= f Pr 


-$ (P(«)) 4 : TEM£ 0)fcl b Qc Pi b Q Pl A . . . A P 4 b Q Pt 


The proof will proceed in two steps: first, we will lower bound the probability 
that permutations Pi and P4 satisfy some conditions given in the definition 
below, and then, assuming (Pi,Pi) is good, we will lower bound the probability, 
over the choice of P2 and P3, that TEM^ ki b Qc . For this second step, we will 
directly appeal to a previous result by Cogliati et al. [ 7 ]. 

We start by giving the conditions defining good pairs of permutations 
(Pi , P4) • We stress that these conditions cannot be accommodated in the def- 
inition of bad transcripts since they depend on values of Pi and P4 which do 
not appear in the queries transcript, so that they cannot be defined from the 
transcript r alone. We also warn the reader upfront that conditions (C-5) and 
(C-6) are “dummy” conditions that will easily be seen to be impossible to fulfill, 
yet will allow us to cleanly use the previous result of Cogliati et al. [ 7 ]. 

Definition 2. A pair of permutations (Pi,Pi) such that Pi b Qp ± and P^ b Qp 4 
is said bad if at least one of the following conditions is fulfilled (see Fig. 3 for a 
diagram of the first ten conditions): 

(C-l) there exists (t,x,y) £ Qc, ^2 C U2, and V3 £ V3 such that 

( Pi ( x ® ko ® t) ® k\ ® t = U2 
\ P^ 1 (y © ko © t) © h\ © t = ^ 3 ; 

(C-2) there exists (t,x,y) £ Qc, (^2,^2) C Qp 2 , and u% £ P3 such that 

( Pi (x © k 0 © t) © ki © t = u 2 

\ V 2 © ko © t = Us] 

(C- 3 ) there exists (t,x,y) £ Qc, (^3,^3) £ Qp 3 , and V2 £ V2 such that 

f P 4 1 (y © ko © t) © k\ © t = vs 

\ Us © ko © t = V2\ 

(C~4) there exists (£, x, ?/), (£', P, y'), (t", P', y") £ Qc with (t,x,y) distinct from 
(t',x',y') and from (t",x" : y") such that 

( Pi (x © ko © t) © t = Pi (P © & 0 © t') © t' 

\ P 4 1 (y © ko © t) © t = P 4 *( y " © ko © t") 0 Pj 

(C- 5 ) there exists (t,x,y,) 7^ (t',x',y f ) £ Qc such that 


Pi (x © k 0 © £) © t — Pi ( P © k 0 © © t f 

t = t’\ 
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(C-6) there exists (t,x,y,) 7^ (t',x',y f ) G Qc such that 

f P 4 1 (^/ ® ko 0 1) ® t = P 4 ^{y' ® k 3 ® t') ® t f 
\t = t'; 

(C- 7 ) there exists (t,x,y) 7^ (t',x',y') G Qc and u 2 G U2 such that 

r Pi {x ® ® t) ® ® t = 1/-2 

\ P 4 1 (?/ ® /Co ® ® t = P4 1 (?/ ® ^0 ® t') ® t'l 

(C-8) there exists (t,x,y) 7^ (t',x',y') G Qc and v 3 G V3 snc/i that 

( P^ 1 (y ® k 0 ® t) 0 fei © t = v 3 
\ Pi(x ® fco ® t) ® t = Pi(V ® fco ® O ® t’\ 

(C- 9 ) there exists (t,x,y) 7^ (t , ,x , ,y t ) G Qc and (u2, V2), (u 2 , v 2 ) G Qp 2 such 
that 

{ Pi (x ® ko ® t) ® k\ ® t = U2 
Pi{x f ® ko ® b) ® ki ® t f — u 2 
V2 ® t = v' 2 ® t'\ 

(C- 10 ) there exists (t,x,y) 7^ (t',x\y f ) G Qc and (us, ^3), (u f 3 , v 3 ) G Qp 3 such 
that 

{ P4 1 {y ® ko ® t) ® ki ® t = v 3 
P4 1 (y f ® ko ® t ') ® k\ ® t’ — v 3 
u 3 ® t = u 3 ® t'; 

(C-ll) CX2 > ^q c ; 

(C- 12 ) as > y/q c ; 

(C- 13 ) p 2 > v^; 

(C-W (ds > y/q~ c ; 

where 

def 

a 2 = \{(t,x,y) £ Qc : Pi (7 © k 0 ® t) © ki ® t £ U 2 }\, 

ol 3 = f | {(t,x,y) £ Q c ■■ Pp(y®k 0 ®t)®k 1 ®t£V 3 }\, 

P2 = f \{(t,x,y) £ Q c ■■ 3 (t',x',y') ± (t,x,y), 

P\{x © ko © t) © t = P\{x' © ko © t ') © t'}\, 

rlpf 

P 3 = \{(t,x,y) £ Q c : 3 (t',x',y') 7 ( t,x,y ), 

P4 1 (y ®k 0 ®t)®t = Pp(y' © k 0 © t') © Cl- 

Otherwise we say that (Pi, P4) is good. We denote P goo d; resp. Pbad the set of 
good , resp. bad pairs of permutations (Pi, P4) such that Pi \~ Qp 1 and P4 b Qp 4 . 

In all the following, we denote U the set of pairs of permutations (Pi, P4) such 
that Pi b Qp 1 and P4 b Qp 4 . The first step towards studying good transcripts 
will be to upper bound the probability that the pair (Pi, P4) is bad. 
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Pi Pi P 3 Pt 


(C-l) 


(C-2) 


(C-3) 


(C-4) 


(C-5) 


(C-6) 


(C-7) 


(C-8) 


(C-9) 


(C-10) 



Fig. 3. The ten “collision” conditions characterizing a bad pair of permutations 
(Pi, P4). Black dots correspond to pairs («2, 1^2) € Qp 2 or (<1,3 , V3 ) € Qp 3 ■ Note that for 
(C-4) one might have = (t" , x"), and for (C-9) (resp. (C-10)) one might have 

x © t = x' ® t' (resp. y © t = y' © t'). 
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Lemma 4 . For any integers q c and q p such that q p + q c + 1 < N/ 2 , one has 


r> tv d d \ rr 1 / 4 «c + 16 9?9p + 4^ 10 ql' 2 + 4<? c ^ + 10 Jq c q v 

Pr[(Pi,P 4 ) € iTbad] < ^ + ^ 

w/iere the probability is taken over the uniformly random draw o/(Pi,P 4 ) in II. 


Proof. We upper bound the probabilities of the fourteen conditions in turn. We 
denote Ft \ the set of pairs of permutations (Pi,P 4 ) G II satisfying condition 


Condition (C-l). Fix (t,x,y) G Qc , ^2 G P25 and U3 G V3. Note that if 
x 0 ko ® t = u\ for some (ui,vi) G Qp 15 then v\ 0 £q ® t cannot be equal 
to U2 since otherwise r would satisfy (B- 2 ). Similarly, if y ® fc 0 ® £ = ^4 for some 
(?i 4 ,u 4 ) G Qp 4 , then U4 ® k\ ® t cannot be equal to U3 since otherwise r would 
satisfy (B- 3 ). On the other hand, if x 0 ko ® t £ U\ and y 0 ko ® t £ V4, then 
the probability over (Pi,P 4 ) II that 

f Pi (x ® ko 0 t) = u 2 ® ki © t 
\ P 4 1 (y ® ko ® t) = vs ® k\ ® t 

is at most 1 /(N — q p ) 2 < 4 /N 2 . (In more details, if?i2®&i®t G V\ oru3®£q®£ G 
P 4 , then this probability is zero, whereas otherwise it is exactly 1 /(N — q p ) 2 .) 
Summing over the at most q c q 2 possibilities for (t,x,y), 1x2, and Vs yields 

Pr[(Pi,P 4 ) e ni] < -jfr- 

Conditions (C-2) and (C- 3 ). We consider (C-2), the reasoning for (C- 3 ) is sim- 
ilar. Fix (t,x,y) G Qc, (^2,^2) G Qp 2 , and u% G Us- Note first that for (C- 2 ) 
to be satisfied, one must have V2 ® ko ® t = 1x3, and there are by definition at 
most C2,3 triplets ((t,x,y),V2,us) satisfying this equality. If x ® ko ® t = u\ for 
some (ui,vi) G Qg, then v\ ® k\ ® t cannot be equal to U2 since otherwise r 
would satisfy (B-2). On the other hand, if x ® &o ® t ^ Pi, then the probability 
that Pi(x ® fco ® t) = U2 ® k\ ® t is at most 1 /(N — q p ) < 2 /N (it is zero if 
U2 ® k\ ® t G Vi, and 1 /(N — q p ) otherwise). Summing over the at most 0^,3 
possibilities for (£, x,y), (1x2, ^2), and 1/3, with (*2,3 < q P \fq c since otherwise r 
would satisfy (B-6), we obtain 

Pr[(Pi,P 4 ) € W} < 


Pr[(Pi,P 4 ) e n 3 ] < 


^qp^/ic 

TV 


Similarly, 
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Condition (C~ 4 ). Fix (t,x : y) : (t f ,x f : y'), (t" : x" ,y") G Qc with ( t,x,y ) distinct 
from (£', x',y') and from (£", x" , y"). First, note that if x ® ko ® t = x' 0 ko ® t' 
or y 0 ko 0 t = y" ® ko ® t" , then (C-4) cannot be satisfied. Hence, we assume 
that none of these two equalities holds. We consider three cases. Assume first 
that x 0 fco ® t = u\ for some (iq,'iq) G Qp 1 . Note that there are at most 
aq possibilities for (t,x,y), and aq < yfq c / 2 since otherwise r would satisfy 
(B- 4 ). Moreover y ® ko ® t £ V4 since otherwise r would satisfy (B-l). Hence, 
the probability that 

PZ\y ®k 0 ®t)®t = Pp(y" ® k 0 © t") 0 t" 

is at most 1 /(N — q p — 1 ) < 2 /N. (In more details, if y " 0 ko 0 t" G V4, then 
this probability is either zero if P^ 1 {y" 0 ko 0 t") 0 t 0 t" G P 4 , or exactly 
1 /( 7 V — q p ) otherwise, whereas if y" 0 ko 0 t" £ V 4 , then this probability is at 
most 1 /(N — q p — 1 ).) Summing over the at most y/qk / 2 x q c possibilities for 

(t,x,y) and (£", x" , y"), the probability of this first case is at most qj 2 /N . The 
second case where y(&ko 0t G V 4 is handled similarly. Finally, consider the case 
where x 0 ko 0 t £ U\ and y 0 ko 0 t £ V4. Then the probability that 

f Pi (x 0 ko 0 t) 0 t = Pi (x' 0 ko 0 t') 0 t' 

\ P 4 _1 (2/ 0 k 0 0 1 ) 0 t = P 4 1 (y" 0^0 1 ") 0 1 "\ 

is at most 1/(N — q p — l) 2 < 4 / 7 V 2 . Summing over the at most q 3 possibilities for 
(t,x,y), (t',x',y f ), and (t", x", y"), the probability of this third case is at most 
4 g 3 / 7 V 2 . Overall, we obtain 

4 a 3 2 q^ 2 

Pr [ { P 1 ,P 4 )e n 4 }<^ + ^ r . 

Conditions (C-5) and (C- 6 ). These conditions cannot be satisfied. Indeed, 
assume that there exits (t,x,y) ^ (t',x',y') G Qc satisfying (C-5). Since t = t' , 
then x ^ x' by the assumption that the distinguisher never makes pointless 
queries. This obviously implies that P\(x 0 ko 0 t) 0 t 7^ P\(x r 0 ko 0 t') 0 C, a 
contradiction. The reasoning is similar for (C-6). Hence, 

Pr[(Pi,P 4 ) Gil 5 ] = Pr[(P 1 ,P 4 ) Gil 6 ] = 0 . 

Conditions (C-7) and (C- 8 ). We consider condition (C-7). Fix queries (t,x,y) ^ 
(t',x',y f ) G Qc and U 2 G U 2 . We will consider two cases: first, the case where 
y 0 ko 0 t G V4, and then the case where y 0 ko 0 t £ V 4 . For both cases, note 
that if x 0 ko 0 t = u\ for some (iq,tq) G Qp 15 then v 1 0 Aq 0 t cannot be 
equal to u 2 since otherwise r would satisfy (B- 2 ). Hence, we can assume that 
£0&o0t^Pi. II follows that the probability that 

P\(x 0 ko 0 t) 0 Aq 0 t = U 2 

is at most 1 / (N — q p ) < 2 /N (it is zero if u 2 0 Aq 0 t G W, and 1 /(N — 
q p ) otherwise). Summing over the at most aq queries (t,x,y) G Qc such that 
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y ® ko ® t G V4, with a. 4 < yj~q c j 2 since otherwise r would satisfy (B- 5 ), and 
the q p possibilities for we see that the first case happens with probability at 
most q p ->/q^/N. Assume now that y 0 ko 0 t £ V4. Then the probability that 

P 4 _1 (y ®k 0 ®t)®t = Pf 1 (7 ® k 0 © t') 0 1 ' 


is at most 1/(N — q p — 1 ) < 2 /N. (In more details, if y(&ko(&t = 2/'®fco®^, then 
it can easily be seen that it cannot hold, whereas if y ® ko ® t 7^ y' ® ko ® C, the 
equation holds with probability at most 1 /(TV — q p — 1).) Summing over the at 
most q 2 q p possibilities for (£,x,?/), (£', a;', ?/), and U2, we see that the probability 
of the second case is at most Aq 2 q p /N 2 . Overall, 


Pr[(Pi,P 4 )GJT 7 ]< 


QpV^c 

N 


4<?c 
N 2 ' 


Similarly, one has 


Pr[(P 1 ,P 4 )ei 7 8 ] < 


QpVol 4 q^qp 

N N 2 ' 


Conditions (C- 9 ) and (C- 10 ). Consider condition (C- 9 ). First note that, if the 
condition is satisfied, we have x®&o®£ 0 Pi, x'(&ko(Bt' 0 Pi, xx2©^i©£ 0 hi and 
^2®^!®^ 0 Vi, otherwise (B-2) is fulfilled. Moreover, if (^2,^2) = (u 2 ,v 2 ), then 
£ = thus x = which is impossible. Hence we must have (112,^2) 7^ (u 2 ,v 2 ). 
The condition can be divided into two conditions: 

9.1 there exists (t,x,y) 7^ (t',x',y') G Qc and (^2,^2) 7^ (^2^2) £ Qp 2 such 
that x®t = x'©t', Pi(x®fco®t) = xx2©&i©t and P^V©/^©^) = ^©^l©^ 
and V2 © t v' 2 © £': 

9.2 there exists (t,x,y) 7^ (t',x',y f ) G Qc and (^2,^2) 7^ ( u 2i v 2) €= Qp 2 such 
that x©£ 7^ V©£', Pi(x©/co©t) = 1^2 ©A: 1 ©£ and Pi(x'©/co©t') = i4©&i©£' 
and V2 © t = ^2 © t! . 

In the first case, one has 

1/2 © ki © t = Pi(x © ko © t) = Pi(x' © ko © t') = u' 2 © ki © t\ 

thus ii2©^2 = £©C = ^2©^2- Hence the first condition implies the following one: 
there exists (t,x,y) G Qc and (^2,^2) 7^ ( u 2 ,v 2 ) G Qp 2 such that Pi (x©fco©£) = 
11-2 © &i © t and U2®u ' 2 = t ?2 ©? 4 > with ^©&o©t 0 C/i and ?X2©&i©t 0 Vi. Since 
^2 < the number of suitable U2 G P2 is lower than and the probability 

that this first condition is fulfilled is at most ( tp^E. < 2( W^ For the second 
condition, fix any queries (t,x,y) 7^ (t',x',y f ) G Qc such that x © t 7^ x f © £', 
x © fco © t 0 Pi, x' © fco © t f 0 Pi and (^2,^2) G Qp 2 . If ^2 © t © t f 0 V2, the 
condition cannot be fulfilled. Otherwise let (u 2 ,v 2 ) G Qp 2 be the unique query 
such that V2 © t = v 2 © t f . Then the probability that Pi (x © ko © t) = U2 © k\ © t 
and Pi(x' QkoQt') =1x2©^!©^ is at most ^ N _ q )(N- q -1) • Finally, by summing 



Beyond-Birthday-Bound Security for Tweakable Even-Mansour Ciphers 153 


over every possible tuple of queries, and by taking into account the condition 
9.1, one has 


Pr[(Pi,P 4 ) C 77 9 ] < 


^Qc^/Qp 4 q^q p 
N N 2 ' 


Similarly, 


Pr[(Pi,P 4 ) G J7i 0 ] < 


ZQcy/qp 


N 


^QcQp 
N 2 ' 


Conditions (C-ll) and (C-12). We see 0^2 (resp. < 23 ) as a random variable over 
the choice of Pi (resp. P 4 ). Note that 

a 2 = |{(t, or, 2 /) G Qc • Pi (a 0 & 0 ® t) 0 ki © t G P 2 H 

= |{(t,x, 7 /) G Qc : x® fc 0 0 t £ Px, Pi (a® fc 0 © *) 0 &i ® t G P 2 }|, 

because, if x®/co®t G Pi and Pi (re® fco ©t) ® Aq ©A G P 2 , then (B-2) is fulfilled. 
We denote Qc,i the subset of queries (t,x,y) G Qc such that x ® feo ® t 0 Pi. 
Then 


E[<a 2 ] = E E Pr \P\(x ® ko ©A) — u 2 ® k\ ® A] 

(t,x,y)eQc, 1 u 2 eu 2 


< 


< 


£ £ G 7 

(t,x,y)eQc,i u 2 eu 2 
2QcQp 

N ‘ 


Using Markov’s inequality, we get 


Pr[(Pi,P 4 ) G/7 n ] < 


^Qp^/Qc 

N 


Similarly, 


Pr[(Pi,P 4 )G/7i 2 ]< 


^Qp^/Oc 

N ’ 


Conditions (C-13) and (C-H). Consider condition (C-13). Note that 

fa = \{{t,x,y) G Qc : 3(A',P,2/') (A, £,?/), 

Pi (a; © fc 0 © A) ® t = Pi (a 7 ® A; 0 ® A') © A'}| 

< oq + |{(A, x, 2 /) G Qc : x 0 fe 0 © t 0 Pi and 3(A', x', 2 /) ± (A, x , 2 /), 
Pi(x © A^o © A) © A — Pi (x © /eg ©A ) © £ j- 1 . 

We denote /?£ the last term of this sum. Thus 

3E [/^ 2 ] = ^ ^ ^ ^ Pr [Pi(x © ko © A) © t = Pi(x © fco © t ) © A ] 

(t,^,y)GQc,i (t' ,x' ,y')^(t,x,y) 

,2 2g 2 


< 


N -q p - 1 


< 


TV 
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This inequality holds because, if x®£ = x'®£', then t^fit' since the distinguisher 
never makes pointless queries, thus P\(x ® fco ® t) ® t = P\(x r ® fco ® t') ® t' 
cannot be fulfilled. Otherwise, 

Pr \P\(x ® ko ® t) ® t = P\{x ® ko ® t ) ® t ] ^ . 

iv — Qp — 1 


Finally, since (B-4) is not fulfilled, aq < y/qfi/ 2. Thus fa > y/Qc implies /3' 2 > 
y/qf/2. Hence, using Markov’s inequality, 


Pr [(P l5 P 4 ) G ills] < Pr [& > VQ~c/ 2] < 

Similarly, 

4 3 / 2 

Pr[(Pi)Pi) G i7 14 ] < —Jj—' 

The result follows by an union bound over all conditions. □ 

We are now ready for the second step of the reasoning. 

Definition 3. Fix any pair of permutations (Pi^Pa) such that Pi b Qp 1 and 
Pa b Qp 4 . We define a new query transcript Q' c depending on (P\,P±) as 

Q! c = {(f,Pi(x® k 0 ® t),Pp(y® k 0 © t)) : ( t,x,y ) G Q c }■ 

We also denote 


P(t,Pi,P 4 ) = Pr 


P2,P 3 


P(n) : TEM 


P2,P 3 

k\,ko 


PQ'c 


(Pi P Qp 2 ) A (Ps P Qp 3 ) 


Lemma 5. One has 

PrjXe = t] > p(r,Pi,P 4 ) 

Pr [ T “ ■')- «w - «p )') 2 nr., i/m„ • 

Proof Clearly, once Pi and P 4 are fixed, TEM P *’^ 2,P3,P4 b Qc is equivalent to 
TEM^f P Q' c . Hence, 

p(r) = £ Pr [(Pi, P 4 ) <— $ il : (Pi = Pi) A (P 4 = P 4 )] p(r, Pi, P 4 ) 

(P i,P 4 )6J7 

> y' p( r , A, P 4 ) 

“ _ ((JV - <7„)!) 2 ’ 

(Pi,p 4 )e/T g0 od p 

The result follows from Eq. (4). □ 


We can now directly appeal to a previous result by Cogliati et al. [7]. 
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Lemma 6. Let q c and q p be two positive integers such that q p + 3 q c < N/2. Fix 
any pair of permutations (Pi,Pi) £ 77 goo d- Then 


p(r,Pi,P 4 ) ( 4q c (q p + 2q c ) 2 14ql /2 + 4^/q2q p 

utp m qi - { n 2 + n 


Proof. One can check that the queries transcript r' = (Q' C: Qp 21 Qp 3 ) satisfies 
exactly the conditions defining a good transcript as per [7, Definition 2]. More- 
over, the ratio p (r, Pi, P4)/ YYiLi is exactly the ratio of the probabilities 

to get t' in the real and in the ideal world once a good pair (Pi,Pi) is fixed. 
Hence, we can apply [7, Lemma 6] that directly yields the result. 6 □ 

We are now ready to prove the main lemma of this section. 


Lemma 7. Let q c and q p be two positive integers such that q p + 3 q c + 1 < N/2. 
One has 


Pr [T re = r] 20 q 6 c + 32 q 2 q p + 8 q c q 2 24 qj 2 + 4q cs f% + 14^q~ c q p 

Pr [Tj d = t] ~ N 2 N 


Proof. From Lemmas 5 and 6, one has 


Pr[T re = 


E 


p(r,Pi,P 4 ) 


Pi i r “ = T ' “ «* - nr- 

y. f - 1 _ 4<? c ( ( Zp + 2g c ) 2 _ 14</</ +4 ■^/q^q p \ 

- I N 2 N I ^ 


_ 7 4 q c (q p + 2 q c ) 2 14 qj 2 + 4^q~ c q p 

~ \ N 2 N 


n SO od (( N q P^ 

| -^good | 


((N-q P )\Y 


= (i - - ltf b 4v E p ' K p - p *) ^ 


where the last probability is taken over the random draw of (Pi, P4) from 77, the 
set of pairs of permutations satisfying Pi b Qp 1 and P4 b Qp a . Using Lemma 4, 
one has 

Pr [T re = r] ^ / 4ql + 16 q 2 q p + 4 q c q 2 p 10 qj 2 + 4 q c ^% + 10 Jq c q p 

Pr [T id =t}~[ N 2 N 

4 q c (q P + 2 q c ) 2 14 ql /2 + 4 ^q~ c q p 

N 2 N 

20 q\ + 32^ + 24 <^ /2 + 4^^ + 14^%, 

~ N 2 N 

6 Even though this might not be apparent to the reader unfamiliar with [7] , the proof 
of Lemma 7 in that paper does not rely on the xor- universal hash functions hi and 
/i 2 appearing in the definition of good transcripts of [7]. 
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Concluding. We are now ready to prove Theorem 1. Combining Lemmas 1, 3, 
and 7, one has 


AdVT C EM[n,4, f ] (Qc, Qp) < 


^QcQp + 3q c q p (5 + 3\/n)^q^q p + AqJ + 2 

N 2 + N 

20 ql + 2>2q 2 c q p + 8q c q% 2\qJ 2 + 4 q c ^/% + lA^q~ c q p 

+ N 2 + N 


20 ql + 34 qlq p + 11 q c ql 

N 2 


24 qj + ^q c ^f% + (19 + 3y/n)^(pq p + 4 qj + 2 
+ N 


Since the result holds trivially when > TV 2 , q^q p > A 2 , or q c q 2 > A 2 , we can 
assume that q % < A 2 , q 2 c q v < N 2 , and q c cp p < N 2 , so that 


(?c ^ Qc f QcQp ^ Qc^/Qp J QcQp ^ y/QcQp 

N 2 - "aT 5 A2" - A ’ anQ ~ N ' 

Thus 

a j cca / x 44g^ 2 + 3 Sq c ^/% + (30 + 3y / n)g p y / g7 + 4g^ 2 + 2 

Adv TEM[n,4,f]Wc,^J S V7 


which concludes the proof of Theorem 1. 
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Abstract. In CRYPTO 2003, Halevi and Rogaway proposed CMC, a 
tweakable enciphering scheme (TES) based on a blockcipher. It requires 
two blockcipher keys and it is not inverse- free (i.e. , the decryption algo- 
rithm uses the inverse (decryption) of the underlying blockcipher). We 
present here a new inverse- free, single-keyed TES. Our construction 
is a tweakable strong pseudorandom permutation (TSPRP), i.e., it is 
secure against chosen-plaintext-ciphertext adversaries assuming that the 
underlying blockcipher is a pseudorandom permutation (PRP), i.e., secure 
against chosen-plaintext adversaries. In comparison, SPRP assumption 
of the blockcipher is required for the TSPRP security of CMC. Our 
scheme can be viewed as a mixture of type-1 and type-3 Feistel cipher 
and so we call it FMix or mixed-type Feistel cipher. 


Keywords: (Tweakable strong) pseudorandom permutation • Coeffi- 
cient H Technique • Encipher • CMC • Fiestel cipher 


1 Introduction 

A tweakable enciphering scheme (TES) is a length-preserving encryption 
scheme that takes a tweak as an additional input. In other words, for each tweak, 
TES computes a ciphertext preserving length of the plaintext. Preserving length 
can be very useful in applications such as disk-sector encryption (as addressed 
by the IEEE SISWG P1619), where a length-preserving encryption preserves 
the file size after encryption. When a tweakable enciphering scheme is used, 
the disk sectors can serve as tweaks. Other applications of enciphering schemes 
could include bandwidth-efficient network protocols and security-retrofitting of 
old communication protocols. 

Examples based on Paradigms. There are four major paradigms of tweakable 
enciphering schemes. Almost all enciphering schemes fall in one of the following 
categories. 

- Feistel Structure: 2-block Feistel design was used in early block ciphers 
like Lucifer [4,22] and DES [23]. Luby and Rackoff gave a security proof of 
Feistel ciphers [12], and later the design was generalised to obtain inverse-free 
enciphering of longer messages [17]. Examples: Naor-Reingold Hash [16], GFN 
[10], matrix representations [1]. 
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- Hash- Counter-Hash: Two layers of universal hash with a counter mode of 
encryption in between. Examples: XCB [13], HCTR [25], HCH [2]. 

- Hash-Encrypt-Hash: Two layers of universal hash with an ECB mode of 
encryption in between. Examples: PEP [3], TET [6], HEH [21]. 

- Encrypt-Mix-Encrypt: Two encryption layers with a mixing layer in 
between. Examples: EME [8], EME* [5] (with ECB encryption layer), CMC 
[7] (with CBC encryption layer). 

Among all these constructions, the examples from Feistel cipher and Encrypt- 
mix-encrypt paradigms are based on blockciphers alone (i.e., no field multiplica- 
tion or other primitive is used). Now we take a closer look at CMC encryption. 

CMC. In CRYPTO 2003, Halevi and Rogaway proposed CMC, a tweakable 
enciphering scheme (TES) based on a blockcipher (Fig. 1). It accepts only plain- 
texts of size a multiple of n, the size of the underlying blockcipher. We call each 
n-bit segment of the plaintext a block. The CMC construction has the following 
problems: 

- For an encryption using e^, the decryption needs e^ 1 . In a combined hard- 
ware implementation, the footprint size (e.g., the number of gates or slices) 
goes up; 

- The security proof of CMC relied on the stronger assumption SPRP (Strong 
Pseudo-Random Permutation) on the underlying blockcipher; 

- Tweak is processed using an independent key, and the proposed single-key 
variant uses an extra call to the blockcipher. 




-f 


T 


Fig. 1 . CMC for four blocks, with tweak T and M = 2(X © Y). Here 2 represents a 
primitive element of a finite field over {0, l} n . 
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Feistel type-1 Feistel type-3 

Fig. 2. The round function of two types of generalised Feistel networks for four block 
inputs. Similar definition can be applied for any number of blocks. 


Feistel Cipher: An Inverse- Free Cipher. To resolve the first issue mentioned 
above, one can fall back on a Feistel network. For inverse-free constructions, the 
main approach so far has been to generalise the classical 2-block Feistel network 
to work for longer messages. Two of the interesting approaches were the type- 
1 Feistel network and the type-3 Feistel network (Fig. 2). In [10], it is shown 
that to encrypt i block plaintext, type-1 and type-3 need 4^ — 2 and 2^ + 2 
rounds respectively for achieving birthday security, which translates to 4£ — 2 
and 2£ 1 2 — 2 invocations of the underlying blockcipher. However, their result is 
meant for providing a security performance trade-off and there is a provision for 
having beyond-birthday security. 

One recent inverse-free construction based on Feistel networks is the AEZ-core, 
which forms part of the implementation of AEZ [9]. It belongs to the Encrypt- 
Mix-Encrypt paradigm, where the encryption uses a Feistel structure. It requires 
five blockcipher calls for every two plaintext blocks, but is highly parallelizable. 


1.1 Our Contribution 

In this paper, we address all the issues present in CMC in our construction. We 
use a mixture of type-1 and type-3 for our construction (hence the name FMix) 
to have an inverse-free construction which minimizes the number of blockcipher 
calls. FMix applies a simple balanced regular function b. Except for this, it looks 
exactly like the composition of i + 1 rounds of type-1 and one round of type-3 
Feistel cipher. The features of FMix can be summarized as follows (see Table 1 
for a comparison study): 

1. FM ix is inverse- free, i.e., it needs the same / for both encryption and decryp- 
tion, having low footprint in the combined hardware implementation. 

2. Because it is inverse- free, an important improvement is on the security 
requirement of e^. CMC relies upon an SPRP-secure e^, while our con- 
struction just needs a PRF-secure e^. This can have significant practical 
implications in reducing the cost of implementation. 
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3. The tweak is processed through the same /, removing the requirement of an 
extra independent blockcipher key. 

4. To encrypt a message with £ blocks and a tweak (a single block), CMC needs 
2^ + 1 calls to the blockcipher e. Its variant (which eliminates the independent 
key), however uses 2^ + 2 calls to e. Our construction requires 2^ + 1 calls, 
without needing the independent key. 


Table 1 . A Comparison of some blockcipher based TES. The description of the columns 
are as follows: (1) Number of blockcipher calls, (2) Number of keys, (3) How many 
sequential layers with full parallelization, (4) Security assumption of the underlying 
blockcipher, (5) Whether it is inverse- free. (CMC’ is a “natively tweakable” variant of 
CMC, as described in [7]). 


Schemes 

#BC 

#Key 

^Layers 

BC-security 

Inverse- free? 

CMC 

21+1 

2 

£+2 

SPRP 

NO 

CMC’ 

2£ + 2 

2 

£ + 2 

SPRP 

NO 

EME 

2^ + 3 

1 

4 

SPRP 

NO 

GFN-1 

M — 2 

4£ — 2 

4£ — 2 

PRP 

YES 

GFN-3 

2 £ 2 -2 

2 £ 2 - 2 

2£ — 2 

PRP 

YES 

AEZ-core 

- ^ £ 

1 

5 

PRP 

YES 

FMix (this paper) 

2£ + l 

1 

£+3 

PRP 

YES 


2 Preliminaries 

2.1 Tweakable Encryption Schemes 

This paper proposes a new tweakable encryption scheme, so we begin by 
describing what we mean by that. Formally, with a tweakable (deterministic) 
encryption scheme we associate four finite sets of binary strings: the message 
space A 4, the tweak space T, the ciphertext space C, and the key space JC. The 
encryption function e : JC x T x Ai — > C and the corresponding decryption 
function X) : JC x T x C — > Ai are required to satisfy the following (known as 
the correctness requirement): 

V(K, ( I,P) eJCxTxAi, d(K, T, e(K, T, P)) = P. 

We also write t(K,T,P) by t K (%P) and d(K,%C) by e^(T ,C). We call a 
tweakable encryption scheme tweakable enciphering scheme (TES) if for all 
plaintext P, key K E JC and tweak T E T, |e(iC, T, P)| = |P| (i.e., it preserves 
length). 


Random Function. In the heart of most encryption schemes lies the notion of 
a random function. Given a domain V and a range P, a random function 
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is a function chosen uniformly from the class of all functions from V to 1Z 
(denoted 7 Z v ). Some elementary calculations show that for distinct xi, ...,x n G 
V, f{ x i) 5 •••, f{ x n) are independent and uniformly distributed over 7 Z. More gen- 
erally, we define the following: 

Definition 1. Let C 7 Z v be a class of functions from V to 71. A random 
-function 

f : v n 

is a function chosen uniformly from ^ . 

Note that choosing a function uniformly from a class {f a }aei indexed by some 
finite set I can be achieved by choosing 00 uniformly from I and then picking 
f ao as the chosen function. 

Tweakable Random Permutation. When 1Z = a popular choice of ^ 
is TTp, the class of all permutations on V (i.e., bijections from V to itself). 
A random permutation over V is a 77^-random function. It is an ideal choice 
corresponding to an encryption scheme over V. The ideal choice corresponding 
to a tweakable enciphering scheme over V with tweak space T is called tweakable 
random permutation 7 f which is chosen uniformly from the class 77^. For each 
tweak T G T, we choose a random permutation 7 independently, and tt is a 
stochastically independent collection of random permutations {7 G T}. 


2.2 Pseudorandomness and Distinguishing Games 

It should be noted that a random function or a random permutation is an ideal 
concept, since in practice the sizes of 7 Z v or 77p are so huge that the cost of 
simulating a uniform random sampling on them is prohibitive. What is used 
instead of a truly random function is a pseudorandom function (PRF), a 
function whose behaviour is so close to that of a truly random function that 
no algorithm can effectively distinguish between the two. An adversary for a 
pseudorandom function /1 is a deterministic algorithm A that tries to distinguish 
/1 from a truly random /q. 

Security Notions. To test the pseudorandomness of /1, A plays the PRF 
distinguishing game with an oracle O simulating (unknown to A) either /1 or 
/o- For this, A makes q queries, in a deterministic but possibly adaptive manner. 
It is well known that there is no loss in assuming a distinguisher deterministic 
as unbounded time deterministic distinguisher is as powerful as a probabilistic 
distinguisher. Thus, the first query x\ = qi() is fixed, and given the responses 
yj = 0(xj),j G {1, ...,i — 1}, the i-th query becomes Xi = (2/1 , ...,^_i), where 

qi is a deterministic function for choosing the i-th query for i G {1, ..., q}. Finally, 
a deterministic decision function examines 7/1, ...,y q and chooses the output b G 
{0, 1} of A. A wins if O was simulating /&. An equivalent way to measure this 
winning event is called prf-advantage defined as 

Aa(/o ; / 1 ) := Adv^iA) = IPr fo [A f ° - 1] - P - 1]|, 
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where Pr /[.] denotes the probability of some event when O imitates /. The above 
definition can be extended for more than one oracles. We can analogously define 
pseudorandom permutation (PRP) advantage Adv p ^ p (A) of /i in which 
case /o is the random permutation. When /i is an enciphering scheme and A 
is interacting with both /i and its inverse ff] 1 (or with / 0 and / q - 1 ) we have 
strong pseudorandom permutation (SPRP) advantage 

AdvJ^iA) = |Pr - 1] - PT fl [A fl ’ frl - 1]|- 

Finally, for a tweakable enciphering schemes with the strong pseudorandom prop- 
erty as above, we analogously define the tweakable strong pseudorandom 
permutation (TSPRP) advantage Adv^ 13 ( A) . 

Pointless Adversaries. In addition to the adversary being deterministic, we 
also assume that it does not make any pointless queries. An adversary A making 
queries to a tweakable encryption scheme / and / -1 is called pointless if either 
it makes a duplicate query or it makes an /-query (T, P) and obtains response C 
and / -1 - query (T, C ) and obtains response P (the order of these two queries can 
be reversed). We can assume that adversary is not pointless since the responses 
are uniquely determined for these types of queries. 

Theorem 1. [11] Let f\ be a TES over a message space M C {0, 1}* and /o 

and /o be two independently chosen random functions. Then for any adversary 
non-pointless distinguisher A making at most q queries, we have, 

Advt/yV) < AAifufi 1 ) ; (/ 0 ,/o)) + fch 

where m = minjT : A4 D {0, 1} £ 7^ 0}- 

The above result says that an uniform length-preserving random permutation is 
very close to an uniform length-preserving random function. 


2.3 Domain Extensions and Coefficient H Technique 

The notion of pseudorandomness, while giving us an approximate implementa- 
tion of random functions, introduces a new problem. In general, it is very hard to 
decide whether or not there is an adversary that breaks the pseudorandomness 
of a particular function, since there is no easy way of exhaustively covering all 
possible adversaries in an analysis, and since there is no true randomness in a 
practically implemented function, probabilistic arguments cannot be used. 

The common get-around is to assume we have PRFs /i,...,/ n each with 
domain V and use them to obtain an F with domain V D V , such that a PRF- 
attack on F leads to a PRF-attack on one of /i, ..., f n . Now, there are known func- 
tions on small domains (like AES, for instance) which have withstood decades of 
attempted PRF-attacks and are believed to be reasonably secure against PRF- 
attacks. Choosing V suitably to begin with and using the known PRFs in our 
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construction, we can find a PRF F with domain V' that is secure as long as the 
smaller functions are secure. This technique is known as a domain extension. 

Here, the central step in a proving the security of F is the reduction of an 
adversary of F to an adversary of one of fi , . . . , f n . This reduction is achieved 
by assuming /i, ...,/ n to be truly random, and giving an information-theoretic 
proof that the distinguishing advantage of any adversary at F is small. Thus, if 
an adversary thus distinguish F from random with a reasonable advantage, we 
must conclude that /i, ...,/ n are not truly random. Thus, all we need to show 
is that when the underlying functions are truly random, F behaves like a truly 
random function. 


Pi P‘1 P 3 Pa 



C 4 C 3 C 2 Ci 

Fig. 3. The FMix construction for four blocks, with M — \ 4 + V 4 


Patarin’s Coefficient H Technique. There are several techniques for showing 
this. The one we use is based on the Coefficient H Technique, due to Jacques 
Patarin, which we briefly describe here. We look at the queries xi,...,x q and 
the outputs 2/1, ..., y q , and note that the adversary’s decision will be based solely 
on the 2 g-tuple (aq, ..., x q , yi , ..., y q ). Now, if Fq is the truly random function 
F is trying to emulate, then Fq^ is also truly random, so on input (aq, ...,x q ), 
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Fq 9 \xi, ..., x q ) will be uniform over 7 Z( 1Z being the range of F. Thus when 

v f = n' = { 0,l} m , 


Pr[F^\x 1 ,...,x (1 ) = (yi,...,y q )] = -E. 

If we can now show that Pi[F^ q \xi^ ..., x q ) = (yi,...,y q )\ (which we call its 
interpolation probability after Bernstein) is “very close” to 7 ^ for most 2 q- 
tuples (xi, ..., x q , yi , ..., y q ), we can conclude that no adversary can distinguish 
F from To with a reasonable advantage. One way to formalize “very close” is 
that the interpolation probability is at least (1 — e)2~ rnq . Moreover, this may not 
happen for all possible views. (A view consists of all input and output blocks 
taken together. Informally, it is the portion of the computations visible to the 
adversary after completing all the queries.) So we may need to restrict the inter- 
polation calculation on so called good views. This is the central idea of Patarin’s 
technique. 

Let view(M°) denote the the view obtained by the adversary A interacting 
with O. 

Theorem 2 (Coefficient H Technique [19]). Suppose the interpolation prob- 
abilities follow the inequality 

IPfmUV) > (1 - e) • 2~ nL 

for all views V E V goo d (set of good views). Then for an SPRP- adversary A, we 
have 

Adv^pRp(F) < e + e' 

where e' denotes the probability Pi[view(A Fo,F ) ^ V goo d\- 

This technique was first introduced by Patarin’s PhD thesis [18] (as mentioned 
in [24]). Later it has been formalized in [19]. 

3 The FMix Construction 

We are now in a position to describe our encryption scheme FMix. We use 
one underlying block function, chosen from a keyed family of PRFs {fx : 
{0, l} n — > {0, l} n }Ke)C- The extended domain, which serves as both M and 
C, is U z > 2 {0,l} Zn , all strings consisting of two or more n-bit blocks. In addi- 
tion to a key and a plaintext, the encryption algorithm also takes a tweak X as 
input, which is also supplied to the decryption algorithm. Encryption is length- 
preserving: for m E {0, l} /om , e(LC, m, X) E {0, l} lorn as well. The basic structure 
of the construction is based on that of CMC: a CBC encryption layer, followed by 
a layer of mixing, followed by a CBC decryption layer. However, using a gener- 
alisation of the Feistel scheme, we eliminate the need for ff)- 1 during decryption, 
making do with fx instead, thus making this construction inverse-free (Fig. 3). 
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input : A tweak X, an integer l > 2, l plaintext blocks Pi,..., Pi 
output: l ciphertext blocks 

begin 

T <— f(X) 

Vo 

for i < — 1 to l — 1 do 

Ui < — Vi — i © Pi 

Vi <- f(C7<) 

end 

Ui <-b(K_!©P© 

Kz <- f (C/ z ) 

Ui <- V5 © C/i 
V{ <- f(C7/) 

m Vi © y/ 

Ul_ 1 <- t/ 2 © M 

vu^n uU) 

Cl ^^©b^c E7/) 
for z 3 to l — 1 do 

C/z+i-i ^Ui®M 

'■/.L.., * f«7., ,) 

Cz+2-i Kz + l-z © U'i +2 -i 

end 

^ Ui + y/ 

y/ <- f(t/() 

Vi^T 

c 2 <- y/ © ^ 

Ci *- bcyo')©^ 

end 

Algorithm 1: FMix Encryption Algorithm. The decryption algorithm is 
exactly same as the encryption except that the b(T) is computed in the first 
layer and only T is used in the second. 


The details of the construction are demonstrated in the figure, which shows 
a four-block FMix construction. The algorithm for general l is described in the 
box. Here, b is a balanced linear permutation, which we define below, and b' is 
b~ l . Decryption is almost identical, just with T and b(T) switching roles. 

Definition 2. A permutation b : {0, l} n — > {0, l} n will be called a balanced 
linear permutation if both 1 1 — > b(t) and t ^ t + b(t) are linear permutations. 

One choice of b could be multiplication by a primitive a , but this is not very 
software- friendly. A more software- friendly choice is (C,^) •— » where 

t\ and t 2 are the higher and lower halves of t. 

Notation for Our Construction. For our analysis we will assume the underly- 
ing PRF to be a truly random function /. We now model the encryption scheme 
in terms of computations based on /. An encryption is a computation 
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C < — e^(T,P), 

where T G {0, l} n , and P, C G {0, l} ln for some l > 2. Similarly, a decryption is 
a computation 

p<— a'Cr.c), 

which inverts e*^, for any tweak T. The plaintext P is denoted (Pi, ..., P^), where 
each Pi is an n-bit block of P. Simlarly, the ciphertext C is denoted (Ci, ..., Ci). 

In the TSPRP game, the adversary makes q queries to the oracle O. Each query 
is of the form (£, T, X), where S G {e,d} denotes the direction of the query, 
T G {0, l} n is the tweak, and X G {0, l} nl for some l is the input. If O is 
imitating FMIX, 0(e , T, X) returns £^(T, X), and 0(d , T, X) returns 7>^(T, X). 
If O is imitating a tweaked PRP 77, 0(e, T, X) returns 77(T, X), and 0(d , T, X) 
returns 77 _1 (T, X). The output of O is denoted Y. 

All the queries and their outputs taken together form what we call a view. We 
use the following notation in a view. For the i-th query, 6 l denotes the direction 
of the query, T 2 3 4 denotes the tweak, and P denotes the number of blocks in X. 
When 5 l = e, the blocks of X are denoted Pi, ..., Pp and those of Y are denoted 
Ci,...,C/i. When S l = d , this notation is reversed, i.e., the blocks of Y are 
denoted Pi,...,P^ and those of X are denoted Ci, ..., C/;. In the analysis, the 
tweak T is denoted both Pq and Cg. 

4 TSPRP Security Analysis of FMix 

4.1 Good Views and Interpolation 

Our first task is to formulate the version of Patarin’s Coefficient H Technique 
we shall use for our proof. We begin by restricting our attention to a particular 
class of views. 

Pointless View. A view is an indexed set of tuples 

V = {{5\V,l\PlC))\l < i < q,l < j < V}. 

Here 5 l can take values e and d only. The P’s are positive integers and 
T 2 , Pj, Cj G {0, l} n , called blocks. The Pj and C l - mean the j th block of plaintext 
and ciphertext respectively on the i th query. We denote T 2 by both Pq and Cg. 
For any 0 < a < b < li, we write P^ b to represent the tuple (P^, . . . , P b ) and 
P l to denote Pg.j.- Similar notation for C l and C\ b . A view V is said to be 
pointless if at least one of the followings holds: 

1. 3i 7^ i' such that S l = S l = e, P l = P l . 

2. 3 i ± i' such that S l = 8 1 ' = d, C l = C 1 ' . 

3. 3 i f < i such that 5 l = e, 5 Z = <7, P l = P l . 

4. 3 i' < i such that S l = d , S 1 ' = e, C l = C 1 ' . 



An Inverse-Free Single-Keyed Tweakable Enciphering Scheme 169 


The first two cases are for duplicate queries. The third holds when we obtain 
a response P l for some decryption query C l and then make an encryption 
query P l := P l . (The fourth case is the third case with the order of the queries 
reversed.) It is easy to see that when an adversary A is interacting with a TES, 
the view obtained is pointless if and only if A is pointless. 

As we do not allow a pointless adversary we can restrict ourselves to non- 
pointless views only. Now we define good and bad views among this class. 

Definition 3. (Good and Bad Views). A view {(S' 1 , T 2 , l l , Pj, Cj)|l < i < 
q, 1 < j < l 1 } is said to be good if it is not pointless and 

(Vi with 5 i = e)($i' < i)(C[ = Cf ), and (Vi with S l = d)(Ji' < i)(P{ = P ( ). 

A view that is not good and not pointless is called bad. 

The proof revolves around showing that the good views have a near-random 
distribution, and the bad views occur with a low probability. For the rest of the 
analysis, we fix a good view V. 


Interpolation Probability. Now we consider the interpolation probability for 
FMix construction. It is easy to see that 

IP^Mix(V) = Pr/[FMi = Cl h ,l < i < q } 

where the probability is taken under the randomness of / chosen uniformly 
from the set of all functions from {0, l} n to itself. Similarly, the interpolation 
probability for an ideal random function IP*(V) is 2~ nL where L = Yli=ih- 
This corresponds to the case where O imitating a truly random function. Now 
we state a result the proof of which is deferred to the next section. 

Proposition 1. For any good view V , 

( 2L ) 

IP f FMix (V) > (1 - e) x 2~ nL , where e = 

Armed with this result and the Coefficient H Technique, we are now ready to 
state and prove the main result of this paper. 


Theorem 3. For any SPRP-adversary A making q queries with L blocks in all , 




(?) + (!) 


2 n 


Proof. When a non-pointless adversary A is interacting with a pair of inde- 
pendent random functions (/o,/o)> if obtains a bad view has probability upper 
bounded by (!jj) . To see this, let the bad event occurs for the first time at the i th 
query. If it is an encryption query (similar proof can be carried out for the decryp- 
tion query) then C{ is chosen randomly from {0, l} n and so it matches with one 
of the previous first ciphertext block is at most (i — l)/2 n . So Pr j/ [view^^ 0 ’-^) 

is a bad view] < J2i= i • By using Coefficient H Technique (see in 

Sect. 2.3) and the proposition stated above we have proved our theorem. □ 
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Corollary 1. Let FMixx denote the FMix construction based on the keyed blcok- 
cipher fx- For any TSPRP-adversary A making q queries with L blocks in all 
there exists an adversary A' making at most L encryption queries ( and similar 
time as A) 

( 2L ) 4 - ( q ) 

Adv^EJA) < Ad^(A’) + U) 2 „ 

This follows from the standard hybrid argument. 


4.2 Extension of FMix for Partial Block Input 

In Sect. 3, we define our construction only for complete block inputs. In practice, 
messages-lengths m may not be a multiple of block-length n. For a complete enci- 
phering scheme, our message space needs to be extended to include these partial 
block inputs. Two known methods for message-space extension of a cipher were 
XLS [20] and Nandi’s scheme [14]. XLS is now known to be insecure [15], so we 
use Nandi’s generic scheme for extending the message-space. The generic con- 
struction requires two additional blockcipher keys. We write these blockciphers 
as /2 and f%. The blockcipher /i is used in FMix. Given any partial block x, 
1 < \x\ < n — 1, we write pad(x) = xl||0 n_1_ l a ’L Similarly, chop r (x) denotes the 
first r bits of x. 


input : A tweak X, an integer l > 2, l — 1 complete plaintext blocks 
Pi, ...,Pz_i, partial last plaintext block pi 
output: l — 1 complete ciphertext blocks Ci, ..., CW, partial last ciphertext 
block ci 

begin 

Pl-i «-/2(pad(p,))®PU 

{Ci,..., 0-2, C[_ i) - FMix* {Pi,..., Pi— 2 , PL i) 

ci <— chop| p ,|(/3(P/_ 1 © C'i_ i)) ©pi 

O-i ^/ 2 (pad(ci))©C i , _ 1 

end 


Theorem 4. For any SPRP- adversary A making q queries with L blocks 
(including incomplete) in all ; 


Adv'S(A) < 


(?) + (!) . 3q(g- 1) 

2 n ' 2 n +! 


The proof of the statement is immediate from Theorem 1 and the generic 
conversion as described in [14]. 


5 Proof of Proposition 1 


In this section we provide the proof of Proposition 1. 
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Proposition. For any good view V, 

( 2L ) 

IP^ix(V) > (1 - e) x 2~ nL where e = . 

We find a lower bound for the probability on the left by counting the choices of 
/ that give rise to V. For this counting, we find the number of internal states 
(simulations) a that can result in V, and for each cr, the number of choices of 
/ compatible with it. As it turns out, slightly undercounting the simulations 
(counting only what we call admissible simulations) will suffice to prove our 
security bound. 

5.1 Simulations 

We shall develop an effective way of calculating the interpolation probability 
of V. We begin by introducing the notion of variables. Let E be the set of all 
encryption query indices, i.e., E = {i\5 l = e}. Similarly, let D be the set of all 
decryption query indices. In identifying and labelling internal blocks, we continue 
using superscripts to denote query indices. Thus, for a query i, the 2 P inputs of 
/ (other than T 2 ) are denoted U \, £/£ , U[\ ..., U ft , and the 2 P + 1 outputs of / 
are denoted Vq , V {, ..., Vj 2 , V/ 2 , ..., VJ' 2 . For ease of notation, we shall write both 
Uq and Uft to denote T 2 . 

Variables and Derivables. We pick a set of output blocks 

5 = {Vj\i eEje {l, ..., V }} u {vf\ i eDje {l, 1*}}. 

S will be our set of primary variables, or simply variables. Any non-trivial 
linear combination of variables, optionally including blocks from V as well, will 
be called a derivable. While the proof will be primarily depend on variables, 
derivables will serve in the proof mainly to simplify notation and make the proof 
easier to grasp. Examples of derivables would be C/3, JT Vft and Vf + P/. Note 
that a linear combination of view blocks alone, say C\ +C \ , will not be considered 
a derivable, since it’s value has already been fixed by choosing V. 

Let us assume for now that the input block and its corresponding output 
block are unrelated. We note that all input and output blocks of / are either 
variables or derivables. Thus, if we assign values to the variables, all the inputs 
and outputs of / over all queries are linearly determined. Thus, the variables 
linearly generate the entire set of input and output blocks, while themselves 
being linearly independent. We now formalise the notion of value assignment to 
variables. 

Definition 4. A transcript r is a collection of variable-value pairs (Z, v) such 
that no two pairs in the collection contain the same variable. For every (Z, v) G r, 
the variable Z is said to be assigned the value v under r. We denote this as Z\ T = v . 
The domain D(r) of a transcript r is defined as {Z\(3v)(Z,v) G r}. Given a set 
S of variables, a transcript r with D(r) = S is said to be an instantiation of S. 
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For a transcript r and a derivable Z' whose value only depends on the variables 
in B(r), r effectively determines a value for Z' . This value is denoted by Z'\\ T . 
For ease of notation, for any view block X, X\\ T will simply denote the value 
of X fixed in V. An instantiation a of S will be called a simulation, since it 
determines all inputs and outputs of / and thus describes a complete simulation 
of the internal computations that resulted in view S. 

Not all simulations make sense, however, when we consider the connection 
between and input block and its corresponding output block. A dependence now 
creeps in among the variables, owing to the key observation below, which poses 
the only non-trivial questions in the entire proof. 

Wherever the inputs of / are identical, so are its outputs. 

There can be simulations which violate this rule, and thus describe internal com- 
putations that can never occur. A simulation which actually describes a possible 
set of internal computations is called realisable. It is immediately clear that our 
observation holds for all realisable simulations. The problem of calculating the 
interpolation probability of V boils down to counting the number of realisable 
simulations. 


5.2 Admissibility 

All realisable simulations can be difficult to count, however. We shall focus 
instead on a smaller class of simulations, called admissible simulations, which 
are easy to count and yet are abundant enough to give us the desired result. 
Before that, we let us formulate in specific terms the ramifications of this obser- 
vation. The immediate consequence is what we call pre-destined collisions. Let 
T = lh{£/o, U[, ..., Uf, ..., U^} be the set of all input blocks of /. 

Definition 5. A pair of input blocks Z\,Z^ £ T is said to constitute a pre- 
destined collision if for any realisable simulation a, 


Zl\\a = ZtW.. 

All other collisions between input blocks are called accidental collisions. Our 
next task is to identify all pre-destined collisions. For that we’ll need some more 
definitions. 

Definition 6. Query indices i and i' are called k- encryption equivalent for 
some k < min(P, P ) if either i = i' , or 

This is denoted as i ~ ek i ' . Similarly, i and i' are called k-decryption equiv- 
alent for some k < min(P, P ) if either i = i' , or 


This is denoted as i ~d k i' - 
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Note that if i ~ ek i ', then (Vfc' < k)(i ~ ek , z'), and similarly for decryption 
equivalence. Our choice of V as a good view ensures that z G E whenever i ~ ei i' 
for some i' < z, and z G D whenever z ~ di i' for some z' < z. We can now make 
a list of pre-destined collisions: 

- (Ul, U’l), 0 < k < min(P, z ~ efc *' 

Substituting V^_ 1 + PI for and V£_ x + C*. for Ujf, we can re-write the pre- 
destined collisions as 

- {Vi _ ! + Pl, V£_ x + Pj:),0<k< min (Z\ V'), i z' 

- (r fe li + c£, Wh + c£), 0 < fc < mm(r, r'),i ~ dk *' 

List of Pre-destined Collision. By our Observation, a pre-destined collision 
on inputs naturally entails a collision on the corresponding outputs. This leads 
to a corresponding set of pre-destined output collisions, which we write in 
the form of equations over derivables and view blocks: 

(a) (i ~e fc *') - (Vi = Vi'), 0 < k < min (l\r'), 

(b) (i ~ dk i ') - (V* = V fc «'),0 < fc < min(ry'). 

The pre-destined output collisions linearly follow from the pre-destined collisions, 
but are formulated separately here, because they’ll later be useful as a class of 
constraints on realisable simulations. Finally, we define the class of admissible 
simulations. 

Definition 7 (Admissible). A simulation a is called admissible if, for any 
G T that do not constitute a pre-destined collision, Zi\\ a ^ Z^Wo- 

Thus, in an admissible simulation, no two input blocks of / can accidentally 
collide, and the only collisions are the pre-destined ones. 


5.3 Basis and Extension 

We now identify a subclass B of the variables which are linearly independent 
under assumption of admissibility, and such that an instantiation tb of B admits 
a unique extension E(tb) to a realisable simulation. We shall call B a basis of 
X. First, we’ll need one more definition. 

Definition 8. A query index i, 1 < z < q, is called k-fresh, k > 0 if k = l l , or 
k <l l and $i f < i with k <l l such that i ~ ek i' or i ~ dk i' • 

The set Ej~ of k- fresh encryption queries is defined as {i\5 l = e, z k- fresh}. Simi- 
larly, the set Dk of k- fresh decryption queries is defined as {i\S l = d,i /c-fresh}. 
Clearly, E = U kEk, and D = U kDk, since any z is P-fresh. 
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We are now in a position to choose our basis B. Let l = max^ l\ We define 
the following: 

Bj = {Vj\i € Ej}, 0 < j <1, 

B'j = {Vj*\i € Dj},0 < j <1. 

Finally, we define our basis as 


i 

B=\J(B j UB' j ). 
j = 0 

We next show how to obtain a = E (tb) given instantiation tb of B. To 
simplify the description, we shall use a couple of new definitions. 

Definition 9. The encryption ^-ancestor of a query index i is defined as 

A e k (i ) = min i! . 

i~ei V 

Similarly , the decryption /c-ancestor of a query index i is defined as 

A k (i) = min i f . 

Clearly, if i is k- fresh, then i is its own ^-ancestor. 

Definition 10. For a query index i and a transcript r, the query slice at i of 
r is defined as 

Qi( T ) = {{Z\v)\ (Z\v) e t}. 

Thus, a query slice is the portion of a transcript that refers to a specific query. 
The query slices of a transcript form a partition of it. 

We are now ready to describe how to uniquely obtain a. To begin with, for 
all Zg 5, we set 

Z\cr = Z\ Tb - 

This gives us, among other things, the complete Q i(cr). (To see why, assume 
without loss of generality that S 1 = e. Then 1 G Ej for every j, so Vj G B for 0 < 
j < l 1 .) We proceed inductively to determine Qi(cr) based on Qi(cr), ..., Qi-i(cr). 

Suppose we have determined Qi'(cr) for all i' < i. For 0 <j< l\ let ij denote 
Aj l (i). Clearly, {ij}j form a non-decreasing sequence, and ip = i. Let 

k = min j. 

Suppose without loss of generality that S' 1 = d. Thus, for all j > k, i G Dj. So 
V' 1 G B for every k < j <V. For 0 < j < k, since i ~ dj ij , and ij is decryption 
j-fresh, we use 4.3 (b) to set 
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Finally, we set 

vS\* = v 0 *\\*- 

This completes our extension of tb- 

To show that a indeed is a simulation, we just observe that if U \~ 1 Qi(cr) 
is realisable, and 5 l = d, then Qi(cr) cannot violate 4.3 (a) (which concerns 
encryption queries only), and Qi(cr) is chosen so as to conform to 4.3 (b). 


5.4 Extension Equations 


We observe that in extending tb to E(t#), once we’ve set the basis variables 
in accordance with tb, none of the steps we perform thereafter depend on the 
specific instantiation tb- Thus, for each variable we can identify an equation 
relating it to the basis variables, so that a simulation can be obtained by sim- 
ply plugging in an appropriate instantiation of B. We call these equations the 
extension equations. 

Pick i G E,j G {0, ..., I 1 }. Then V- is a variable. Let b\ be j, and a\ be A e -(i). 
Having obtained fq, ..., and ai, ..., a^, we stop if k is odd and a & G P, or if k 
is even and a & G D. Otherwise, let bk+i = l afc — 1 — 5/ c , and (ik+i be {a^). 

Since dk + 1 > a^, this terminates after finitely many steps, say upon obtaining 
dk 0 - Then we call ((bi , a \ ),..., (bk 0 , afc 0 )) the extension chain of denoted 
£(Vj). 

To obtain the extension equation of V- from £(V7), note that V- = V ) ai , and 
for any even k < fco, V- ak = and (if k < ko) V^ k+1 = V^ k . To bridge 

these equations, we just need to recall the equations relating V- to V'A . for 

J L 1 J 

arbitrary i' with l l > j. 

From our algorithm, = Vq , V£, = b(y^, _ x + Vq + P^,) + C\ and 

= b(V{' + vi + Pi') + q;, . 

For 1 < j <P — 2, recall the masking equation 


b" = + y*, + y? + + q +1 . 

On replacing by b(V J {,_ 1 + Vq + P^ f ) + C^, this becomes 

yf = + < + i$) + cf + + q' +1 . 


The extension equations can be computed inductively using the above. Similarly, 
for derivables, we can get the extension equations by writing it in terms of 
variables, and expanding these variables through their corresponding extension 
equations. We’ll mostly be interested in the set of basis variables appearing in 
the extension equation of an input derivable Z, called the base 03 (Z) of Z. 

We’ll show that whenever for two input derivables Z and Z ', 03 (Z) = 03 (Z'), 
(Z, Z') is either a pre-destined collision, or Z and Z' cannot collide. This’ll show 
that every accidental input collision corresponds to a linear equation on the basis 
variables and view blocks. Note that this linear equation actually corresponds to 
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n linear equations in terms of the bits, all of which should be dodged. For most 
of the analysis, this distinction will not matter, and it’ll only become important 
when we deal with two special cases in the very end. 

Lemma 1. Every accidental input collision imposes a non-trivial linear equa- 
tion on the basis variables. 

The proof of the lemma is postponed to the end of this section. It basically 
considers all cases for accidental collision and shows that it gives a non-trivial 
linear equation. 


5.5 Bringing It All Together 

We are now ready to wrap up our proof of the proposition 1. Let L denote JT • 

( 2L ) 

Let e = The total number of output bits V in is nL, so clearly 


IP*(V) 


1 


Now, let T C ({0, l} 71 )! 0 ’ 1 !™ be such that (/ G T) < — > (choosing / results in V). 
We see that 


IPibix(V) 


# choices of / which result in V 
# choices of / in all 


\r\ 

( 2 ») 2 " ' 


Let 21 be the set of all admissible simulations. For an admissible simulation a, 
let T a denote the subset of T such that (/ E To) < — > (choosing / results in V 
and a). With this notation, we can write 


m > E i^i- 

crGSt 

To calculate |JEr|, we note that a fixes the values / for L + \B\ distinct inputs. 
Thus, 

\T a \ = (2”) 2n - L -l B l. 

Since this does not depend on cr, we can write 

Ei^i = i2ti-(2 n r- zHB| . 

o-est 

Now, each admissible simulation is E(r^) for some instantiation tb of B. To 
ensure E (tb) G 21, we just have to choose tb such that it dodges all the linear 
equations corresponding to accidental input collisions. As there can be at most 
( 2 ^) such equations, we conclude that 

|2l| > 2 n|s| - ■ 2 ra d s l- 1 ) = 2 n|s| (l - e). 

Putting all of this together, we get 

1^1 > (2 n T~ L • (1 - e) = (1 - e) • IP,(V) • (2”) 2n , 


from which the Proposition follows. 
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5.6 Proof of Lemma 1 

Proof. We’ll divide the possible input pairs into several cases, which we’ll further 
subdivide into groups, and we write out the proof only for the first case in each 
group, and the rest follow from it. The classifying factors are as follows: 

- Whether they both occur in the same layer (encryption layer (77]} or 
decryption layer (7/]}), or in different layers; 

- Whether they occur in the right layer (encryption layer of an encryption 
query, or decryption layer of decryption query) or the wrong layer; 

- Whether their first-cross indices match (this would be the current query 
index if in the wrong layer, and the index after the first backward jump during 
extension if in the right layer). 

We begin with an easy group of cases, where both occur in the right layer, and 
their first-cross indices do not match: 

Case la. (77], 77],'), i, i' G E,a = A^_ x (i) < = a' 

05(77]) = 05(V/_i) can only contain basis variables with query indices < a. Since 
05(17]') = VSiVf,^) will contain either Vfi' or T//, 05(17]) + 05 (Uf,). 

Case lb. (7//,7 //),m' G D,a = Aj_ x (i) < = a' 

Case lc. G E,i' G D, a = Af^i) < A ^(i') = a' 

Case Id. (Uf, Uf), i G D,i' G E,a = A^i) < Af^i') = a ' 

We next turn to another easy group, where exactly one of them is in the right 
layer, and first-cross indices do not match: 

Case 2a. (77], 7//), i, i' G E,a = Aj_ x (i) ± i' 

If a < i', Vjf, is in 05 (7/]?') but not in 05(7/]). If a > i', either Vf L _ 1 is in 05(7/]) 
but not in 05(7/]? ), or V/a is in 05(7/]) but not in 05(7/7?' ) . 

Case 2b. (7/], 7/]f ), i, i' G D,i ^ A*^') = a' 

Case 2c. (77], 7/],'), i G E,i' G D, a = A]_ 1 (i) ± i’ 

Case 2d. (7//,7/]f),i G E,i' G D,i^ A^^i') = a' 

The next group is even easier: both in the wrong layer, with non-matching first- 
cross indices. This takes care of all cases with non-matching first-cross indices. 

Case 3a. (77], 77]'), i, i' G D,i < i' 

Vfi is in 05(77],') but not in 05(7/]). 

Case 3b. (U'\ 7/]f), i, i' eE,i<i' 

Case 3c. (7/],7/]f),i G D, i' G E, i < i' 

Case 3d. (U'\ Uf,), i e E,i' eD,i<i' 
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Next we turn to a slightly trickier group, where they are in the same layer, both 
in the right layer, and first-cross indices match. 

Case 4. G E^A^ii) = 

Consider = ((b u ai (6 feo , a fco )), C(V^'_ X ) = ((b[, ai), .{b' k , Q , a' fe , )). If 

the chains follow the same query paths (i.e. , if ko = fcg and (Vfc < fco)( a fc = a^)), 
assuming without loss of generality ko is odd and ko £ E (from the chain- 
termination condition), we have V^° G 53(7/]), and V^ 0 G 53(7/]/), all other 

basis variables in the two extension equations being the same. Thus, if bk 0 ^ 
b' ko ,?8(Uj) 7^ 53(7/],'), and if bk 0 = b ' ko , (7/], 7 /],' ) is either a pre-destined collision 
(if P] = P],) or it cannot be a collision. If the chains do not follow the same 
query path, we can find k such that a& 7 - a! k , which reduces to one of the previous 
cases. 

Case 4a. (7//, 7//), i,i' G = A^^i') 

The next group is much simpler, where they are in different layers, both in the 
right layer, and first-cross indices match. 

Case 5. (77], 7//), i G E,i' G D, a = m 

Without loss of generality, a G E. So Vfi is in 53(7/]/) but not in 53(7/]). 

Case 5a. G P,i' G P,^_i(i) = -4]/_ 1 (i / ) 

We’re almost done with the proof at this point. We wrap up with the few remain- 
ing cases. In the next group, they come from different layers, exactly one of them 
in the right layer, and first-cross indices match. 

Case 6. (7/],7//),M' G E,A)_ |(i) = i' 

Here, Vf-, is in 53(7/]/) but not in 53(7/]). 

Case 6a. (7/], 7//), i, i' Gh,i = ^,_ 1 (i') 

The four cases of the hnal group can be proved using the extension- chain- 
comparison technique of Case 4. In this group, they are in the same layer, at 
least one in the wrong layer, and first-cross indices match. (If they are both in 
the wrong layer, and first-cross indices match, they occur at the same query, so 
they cannot be in different layers, so this wraps up the case analysis). 

Case 7. ( G E,i' G D, A^^i) = i' 

Case 7a. (T/f, T/jf), i G E,i' G D,i = ^,_ 1 (i / ) 

Case 7b. (7/],7/],),i G D 
Case 7c. (T/f ,7/jf),i G E 

This leaves only a few boundary cases (involving the likes of I/]*), which can be 
easily verified. We just point out two special cases which underline the impor- 
tance of choosing b as a balanced permutation. For the pair for some 

i, if PI = C], the condition for an accidental collision becomes Vq + 6 (Vq) = 0, 
which is still n independent linear equations in terms of the bits, by choice of b. 
Similarly, if i ~ e i', and b(P^) = , the pair (7/^,7/]*) yields the equation 



An Inverse-Free Single-Keyed Tweakable Enciphering Scheme 179 


b(Vi \_ x ) + Vii-i = 0, which again is n independent linear equations in terms of 
the bits. 

Thus we establish our lemma. □ 

6 Conclusion and Future Works 

In this paper we propose a new Feistel type length preserving tweakable encryp- 
tion scheme. Our construction, called FMix, has several advantages over CMC 
and other blockcipher based enciphering scheme. It makes an optimal number of 
blockcipher calls using single keyed PRP blockcipher. The only drawback com- 
pare to EME is that the first layer of encryption, like CMC, is sequential. We 
can view our construction as a composition of type-1 and type-3 Feistel ciphers. 

There are several possible scopes of future work. When we apply a generic 
method to encrypt last partial block message, we need an independent key. (This 
is always true for generic construction.) However, one can have a very specific way 
to handle partial block message keeping only one blockcipher key. The presence 
of the function b helps us to simplify the security proof. However, we do not 
know of any attack if we do not use this function (except for handling the tweak 
in the bottom layer - that use is necessary). So it would be interesting to see 
whether our proof can be extended for the variant without using the function b. 
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Abstract. In this work, we study the intrinsic complexity of black- 
box Universally Composable (UC) secure computation based on general 
assumptions. We present a thorough study in various corruption model- 
ings while focusing on achieving security in the common reference string 
(CRS) model. Our results involve the following: 

- Static UC Secure Computation. Designing the first static UC 
secure oblivious transfer protocol based on public-key encryption and 
stand-alone semi-honest oblivious transfer. As a corollary we obtain 
the first black-box constructions of UC secure computation assuming 
only two-round semi-honest oblivious transfer. 

- One-sided UC Secure Computation. Designing adaptive UC 
secure two-party computation with single corruptions assuming 
public- key encryption with oblivious ciphertext generation. 

- Adaptive UC Secure Computation. Designing adaptively secure 
UC commitment scheme assuming only public- key encryption with 
oblivious ciphertext generation. As a corollary we obtain the first 
black-box constructions of adaptive UC secure computation assum- 
ing only (trapdoor) simulatable public-key encryption (as well as a 
variety of concrete assumptions). 

We remark that such a result was not known even under non-black-box 
constructions. 


Keywords: UC secure computation • Black-box constructions • Obliv- 
ious transfer • UC commitments 


1 Introduction 

Secure multi-party computation enables a set parties to mutually run a protocol 
that computes some function / on their private inputs, while preserving a num- 
ber of security properties. Two of the most important properties are privacy and 

C. Hazay — Research partially supported by a grant from the Israel Ministry of Sci- 
ence and Technology (grant No. 3-10883). 

M. Venkitasubramaniam — Research supported by Google Faculty Research Grant 
and NSF Award CNS-1526377. 

(c) International Association for Cryptologic Research 2015 

T. Iwata and J.H. Cheon (Eds.): ASIACRYPT 2015, Part II, LNCS 9453, pp. 183-209, 2015. 

DOI: 10.1007/978-3-662-48800-3-8 



184 C. Hazay and M. Venkitasubramaniam 


correctness. The former implies data confidentiality, namely, nothing leaks by the 
protocol execution but the computed output. The later requirement implies that 
no corrupted party or parties can cause the output to deviate from the specified 
function. It is by now well known how to securely compute any efficient function- 
ality [2,4,24,45,50] in various models and under the stringent simulation-based 
definitions (following the ideal/real paradigm). Security is typically proven with 
respect to two adversarial models: the semi-honest model (where the adversary 
follows the instructions of the protocol but tries to learn more than it should 
from the protocol transcript), and the malicious model (where the adversary 
follows an arbitrary polynomial-time strategy), and feasibility results are known 
in the presence of both types of attacks. The initial model considered for secure 
computation was of a static adversary where the adversary controls a subset 
of the parties (who are called corrupted) before the protocol begins, and this 
subset cannot change. In a stronger corruption model the adversary is allowed 
to choose which parties to corrupt throughout the protocol execution, and as a 
function of its view; such an adversary is called adaptive. 

These feasibility results rely in most cases on stand-alone security, where 
a single set of parties run a single execution of the protocol. Moreover, the 
security of most cryptographic protocols proven in the stand-alone setting does 
not remain intact if many instances of the protocol are executed concurrently 
[40] . The strongest (but also the most realistic) setting for concurrent security is 
known by Universally Composable (UC) security [4]. This setting considers the 
execution of an unbounded number of concurrent protocols in an arbitrary and 
adversarially controlled network environment. Unfortunately, stand-alone secure 
protocols typically fail to remain secure in the UC setting. In fact, without 
assuming some trusted help , UC security is impossible to achieve for most tasks 
[7,8,40]. Consequently, UC secure protocols have been constructed under various 
trusted setup assumptions in a long series of works; see [1,5,10,14,34,38] for few 
examples. 

In this work, we are interested in understanding the intrinsic complexity 
of UC secure computation. Identifying the general assumptions required for a 
particular cryptographic task provides an abstraction of the functionality and 
the specific hardness that is exploited to obtain a secure realization of the task. 
The expressive nature of general assumptions allows the use of a large number of 
concrete assumptions of our choice, even one that may not have been considered 
at the time of designing the protocols. Constructions that are based on general 
assumptions are proven in two flavors: 

Black-box Usage: A construction is black-box if it refers only to the 
input/output behavior of the underlying primitives. 

Non-black-box Usage: A construction is non-black box if it uses the code 
computing the functionality of the underlying primitives. 

Typically, non-black-box constructions have been employed to demonstrate 
feasibility and derive the minimal assumptions required to achieve cryptographic 
tasks. An important theoretical question is whether or not non-black-box usage 
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of the underlying primitive is necessary in a construction. Besides its theoreti- 
cal importance, obtaining black-box constructions is related to efficiency as an 
undesirable effect of non-black-box constructions is that they are typically inef- 
ficient and unlikely to be implemented in practice. Fortunately, a recent line 
of works [25,26,32,47] has narrowed the gap between what is achievable via 
non-black-box and black-box constructions under minimal assumptions. 

More relevant to our context, the work of Ishai, Prabhakaran and Sahai 
[33] provided the first black-box constructions of UC secure protocols assuming 
only one-way functions in a model where all parties have access to an ideal 
oblivious transfer (OT) functionality. Orthogonally, Choi et al. [12] provided a 
compiler that transforms any semi-honest OT to a protocol that is secure against 
malicious static adversaries in the stand-alone (i.e. not UC) while assuming that 
all parties have access to the ideal commitment functionality. In the adaptive 
setting, the work of Choi et al. provides a transformation from adaptively secure 
semi-honest oblivious transfer to one that is secure in the stronger UC setting 
against malicious adaptive adversaries while assuming that all parties have access 
to the ideal commitment functionality. In essence, these works provide black- 
box constructions, however, they fall short of identifying the necessary minimal 
general computational assumptions in the UC setting. 

Loosely speaking, a UC commitment scheme [7] is a fundamental building 
block in secure computation which is defined in two phases: in the commit phase 
a committer commits to a value while keeping it hidden, whereas in the decommit 
phase the committer reveals the value that it previously committed to. In addi- 
tion to the standard binding and hiding security properties that any commitment 
must adhere, commitment schemes that are secure in the UC framework must 
allow straight-line extraction (where a simulator should be able to extract the 
content of any valid commitment generated by the adversary) and straight-line 
equivocation (where a simulator should be able to produce many commitments 
for which it can later decommit to both 0 and 1). We stress that even security 
in the static setting requires some notion of equivocation. Due to these rigorous 
requirements, it has been a real challenge to design black-box constructions of 
UC secure commitment schemes. 

In the context of realizing the UC commitments in the CRS model, Damgard 
and Nielsen introduced the notion of mixed-commitments in [16]. This construc- 
tion requires a CRS that is linear in the number of parties and can be instan- 
tiated under the TV-residuosity and p-subgroup hardness assumptions. In the 
global CRS model (where a single CRS is introduced for any number of exe- 
cutions), the only known constructions are by Damgard and Groth [15] based 
on the Strong RSA assumption and Lindell [42] based on the DDH assump- 
tion, where the former construction guarantees security in the adaptive setting 
whereas the later construction provides static security. 

Another fundamental building block in secure computation which has been 
widely studied is oblivious transfer [21,49]. Semi-honest two-round oblivious 
transfer can be constructed based enhanced trapdoor permutations [21] and 
smooth projective hashing [28], and concretely under Discrete Diffie- Heilman 
(DDH) [46]. Two-round protocols with malicious UC security are presented in 
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the influential paper by Peikert et al. [48] that presents a black-box framework in 
the common reference string (CRS) model for oblivious transfer, based on dual- 
mode public-key encryption (PKE) schemes, which can be concretely instanti- 
ated under the DDH, quadratic residuosity and Learning with Errors (LWE) 
hardness assumptions. In a followup work [13], the authors present UC oblivious 
transfer constructions in the global CRS model assuming DDH, TV-residuosity 
and the Decision Linear Assumption (DLIN). As pointed out in [13], the [48] 
constructions require a distinct CRS per party. In the context of adaptive UC 
oblivious transfer protocols, the works of [12] and [22] give constructions in the 
UC commitment hybrid model where they additionally rely on an assumption 
that implies adaptive semi- honest oblivious transfer. 

It is worth noting that while the works of [48] and [13] provide abstrac- 
tions of their assumptions, the assumptions themselves are not general enough 
to help understand the minimal assumptions required to achieve static UC secu- 
rity. In particular, when restricting attention to black-box constructions based 
on general assumptions, the state-of-the-art literature seems to indicate that 
achieving UC security in most trusted setup models reduces to constructing 
two apparently incomparable primitives: semi-honest oblivious transfer and UC 
commitment schemes. This leaves the following important question open: 

What are the minimal (general) assumptions required to construct UC 
secure protocols , given only black-box access to the underlying primi- 
tives ? 

We note that this question is already well understood in the static setting 
when relaxing the black-box requirement. Namely, in [18] Damgard, Nielsen 
and Orlandi showed how to construct UC commitments assuming only semi- 
honest oblivious transfer in the global CRS model, while additionally assuming 
a pre-processing phase where the parties participate in a round-robin manner 1 . 
More recently, Lin, Pass and Venkitasubramaniam [39] improved this result by 
removing any restricted pre-processing phase. In the same work the authors 
showed how to achieve UC security in the global CRS model assuming only the 
existence of semi-honest oblivious transfer. In particular, this construction shows 
that static UC security can be achieved without assuming UC commitments 
when relying on non-black-box techniques. 

In the stand-alone (i.e. not UC) setting, assuming only the existence of semi- 
honest oblivious transfer [26,27,32] show how to construct secure multiparty 
computation protocols while relying on the underlying primitives in a black-box 
manner. More recently, [12] provided black-box constructions that are secure 
against static adversaries, again, in the stand-alone setting, where all parties 
have access to an ideal commitment functionality (cf. Proposition 1 in [12]). The 
latter construction achieves a stronger notion of straight-line simulation, however 
falls short of achieving static UC security (see more details in Sect. 3). 


1 In such a pre-processing phase, it is assumed that at most one party is allowed to 
transmit messages in any round. 
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In the adaptive setting, the only work that considers a single general assump- 
tion that implies adaptive UC security using non-black-box techniques is the 
result due to Dachman-Soled et al. [14], that shows how to obtain adaptive 
UC commitments assuming simulatable PKE. Moreover, the best known gen- 
eral assumptions required to achieve black-box UC security are adaptive semi- 
honest oblivious transfer and UC commitments [12,17]. Known minimal general 
assumptions that are required to construct these primitives are (trapdoor) sim- 
ulatable PKE for adaptive semi- honest oblivious transfer [11] and mixed com- 
mitments for UC commitments [17]. 


1.1 Our Results 

In this paper we present a thorough study of black-box UC secure computation 
in the CRS model; details follow. 


Static UC Secure Computation. Our first result is given in the static set- 
ting, where we demonstrate the feasibility of UC secure computation based on 
semi-honest oblivious transfer and extractable commitments. More concretely, 
we prove how to transform any statically semi-honest secure oblivious trans- 
fer into one that is secure in the presence of malicious adversaries, giving only 
black-box access to the underlying semi- honest oblivious transfer protocol. Our 
approach is inspired by the protocols from [27] and [37], where we observe that 
it is not required to use the full power of static UC commitments. Instead, we 
employ a weaker primitive that only requires straight-line input extractability. 
Interestingly, we prove that this weaker notion of security, denoted by extractable 
commitments [44], can be realized based on any CPA secure PKE. More pre- 
cisely, we prove the following theorem. 

Theorem 11 (Informally). Assuming the existence of PKE and semi-honest 
oblivious transfer, then any functionality can be realized in the CRS model with 
static UC security, where the underlying primitives are accessed in a black-box 
manner. 

We remark here that this theorem makes a significant progress towards reducing 
the general assumptions required to construct UC secure protocols. Previously, 
the only general assumptions based on which we knew how to construct UC 
secure protocols were mixed-commitments [16] and dual-mode PKE [48] both 
of which were tailor-made for the particular application. Towards understanding 
the required minimal assumptions, we recall the work Damgard and Groth in [15] 
who showed that the existence of UC commitments in the CRS model implies a 
stand-alone key agreement protocol. Moreover, under black-box constructions, 
the seminal work of Impagliazzo and Rudich [31] implies that key agreement 
cannot be based on one-way functions. Thus, there is reasonable evidence to 
believe that some public-key primitive is required for UC commitments. In that 
sense, our assumption regarding PKE is close to being optimal. Nevertheless, it 
is unknown whether the semi-honest oblivious transfer assumption is required. 
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Our result is shown in two phases. At first we compile the semi-honest obliv- 
ious transfer protocol into a new protocol with intermediate security properties 
in the presence of malicious adversaries. This transformation is an extension of 
the [27] transformation that is only proven for bit oblivious transfer, whereas our 
proof works for string oblivious transfer. Next, we use the transformed oblivi- 
ous transfer protocol in order to construct a maliciously fully secure oblivious 
transfer. By combining our oblivious transfer with the [33] protocol we obtain a 
statically generic UC secure computation. 

An important corollary is deduced from the work by Gertner et al. [23], 
who provided a black-box construction of PKE based on any two-round semi- 
honest oblivious transfer protocol. Specifically, the combination of their result 
with ours implies the following corollary, which demonstrates that two-round 
semi-honest oblivious transfer is sufficient in the CRS model to achieve black- 
box constructions of UC secure protocols. 

Corollary 12 (Informally). Assuming the existence of two-round semi-honest 
oblivious transfer, then any functionality can be UC realized in the CRS model, 
where the oblivious transfer is accessed in a black-box manner. 

Implications. In what follows, we make a sequence of interesting observations 
that are implied by our result in the static UC setting. 

- The important result by Canetti, Lindell, Ostrovsky and Sahai [9] presents 
the first non-black-box constructions of static UC secure protocols assuming 
enhanced trapdoor permutations. In fact, their result can be extended assum- 
ing only PKE with oblivious ciphertext generation (which is PKE with the 
special property that a ciphertext can be obliviously sampled without the 
knowledge of the plaintext, and can be further realized using enhanced trap- 
door permutation). In that sense, our result, assuming PKE with oblivious 
ciphertext generation, can be viewed as an improvement of [9] when relying 
on this primitive in a black-box manner. 

- The pair of works by Damgard, Nielsen and Orlandi [18] and Lin, Pass and 
Venkitasubramaniam [39] demonstrate that non-black-box constructions of 
UC commitments, and more generally static UC secure computation, can be 
achieved in the CRS model assuming only semi- honest oblivious transfer. In 
comparison, our result shows that two-round semi-honest oblivious transfer 
protocols are sufficient for obtaining black-box UC secure computation in the 
CRS model. Note that most semi-honest oblivious transfer protocols anyway 
require only two-round of communication, e.g., [21]. 

- In [38,39], Lin, Pass and Venkitasubramaniam provided a unified framework 
for constructing UC secure protocols in any “trusted-setup” model. Their 
result is achieved by capturing the minimal requirement that implies UC com- 
putations in the setup model. More precisely, they introduced the notion of 
a UC puzzle and showed that any setup model that admits a UC puzzle can 
be used to securely realize any functionality in the UC setting, while addi- 
tionally assuming the existence of semi-honest oblivious transfer. Moreover, 
they showed how to easily construct such puzzles in most models. We remark 
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that our approach can be viewed as providing a framework to construct black- 
box UC secure protocols in other UC models. More precisely, we show that 
any setup model that admits the extractable commitment functionality can 
be used to securely realize any functionality assuming the existence of semi- 
honest oblivious transfer. In fact, our result easily extends to the chosen key 
registration authority (KRA) model [1], where it is assumed the existence of 
a trusted authority that samples public key, secret key pairs for each party, 
and broadcasts the public key to all parties. We leave it for future work to 
instantiate our framework in other setup models. 

- The fact that our construction only requires PKE and semi-honest oblivious 
transfer allows an easy translation of static UC security to various efficient 
implementations under a wide range of concrete assumptions. Specifically, 
both PKE and (two-round) semi-honest oblivious transfer can be realized 
under RSA, factoring Blum integers, LWE, DDH, TV-residuosity, p-subgroup 
and coding assumptions. This is compared to prior results that could be based 
on the later five assumptions [13,19,20,48]. 

- Recently, Maji, Prabhakaran, and Rosulek [44] initiated the study of the cryp- 
tographic complexity of secure computation tasks, while characterizing the 
relative complexity of a task in the UC setting. Specifically, they established 
a zero-one law that states that any task is either trivial (i.e., it can be reduced 
to any other task), or complete (i.e., to which any task can be reduced to), 
where a functionality T is said to reduce to another functionality Q, if there 
is a UC secure protocol for T using ideal access to Q. More precisely, they 
showed that assuming the existence of semi- honest oblivious transfer, every 
finite two-party functionality is either trivial or complete. While their main 
theorem relies on the minimal assumption of semi-honest oblivious transfer, 
their use of the assumption is non-black-box and they leave it as an open 
problem to achieve the same while relying on oblivious transfer in a black-box 
manner. Our result makes progress towards establishing this. 

In more details, their high-level approach is to identify complete functional- 
ities using four categories, namely, (1) f XOR that abstracts a XOR-type func- 
tionality, (2) T cc that abstracts a simple cut-and-choose functionality, (3) T Q T 
the oblivious transfer functionality, and (4) f COM the commitment function- 
ality. They then show that each category can be used to securely realize any 
computational task 2 . Among these reductions, functionalities Aor and 
rely on oblivious transfer in a non-black-box way. In this work we improve the 
reduction of functionality f cc . That is, we obtain this improvement by show- 
ing that the extractable commitment functionality f EXT coM and semi-honest 
oblivious transfer can be used in a black-box way to realize functionality T 0 T , 
and combine this with a reduction presented in [44] that reduces T c c to the 
Axtcom functionality in a black-box way. 

One-Sided UC Secure Computation. In this stronger two-party setting, 

where at most one of the parties is adaptively corrupted [29,35], we prove that 

2 Where it suffices to realize the Tot functionality as it is known to be complete [36] . 
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one-sided adaptive UC security is implied by PKE with oblivious ciphertext gen- 
eration. Here we combine two observations, one where our malicious static oblivi- 
ous transfer from the previous result requires using the parties’ inputs in only one 
phase, together with the fact that one-sided non-committing encryption (NCE) 
can be designed based on PKE with oblivious ciphertext generation [6,16]. In 
particular, NCE allow secure communication in the presence of adaptive attacks, 
which implies that the communication can be equivocated once the real message 
is handed to the simulator. Then, by encrypting part of our statically secure pro- 
tocol using NCE, we obtain a generic protocol for any two-party functionality 
under the assumption specified above 3 . Namely, 

Theorem 13 (Informally). Assuming the existence of PKE with oblivious 
ciphertext generation , then any two-party functionality can be realized in the CRS 
model with one-sided adaptive UC security and black-box access to the PKE. 

Adaptive UC Secure Computation. Our last result is in the strongest cor- 
ruption setting, where any number of parties can be adaptively corrupted. Here 
we design a new adaptively secure UC commitment scheme under the assump- 
tion of PKE with oblivious ciphertext generation, which is the first construction 
that achieves the stronger notion of adaptive security based on this hardness 
assumption. Our construction makes a novel usage of such a PKE together with 
Reed-Solomon codes, where the polynomial shares are encrypted using the PKE 
with oblivious ciphertext generation. Plugging-in our UC commitment proto- 
col into the transformation of [12] that generates adaptive malicious oblivious 
transfer given adaptive semi-honest oblivious transfer and UC commitments, 
implies an adaptively UC secure oblivious transfer protocol with malicious secu- 
rity based on semi-honest adaptive oblivious transfer and PKE with oblivious 
ciphertext generation, using only black-box access to the semi-honest oblivious 
transfer and the PKE. That is, 

Theorem 14 (Informally). Assuming the existence of PKE with oblivious 
ciphertext generation and adaptive semi-honest oblivious transfer, then any func- 
tionality can be realized in the CRS model with adaptive UC security, where the 
underlying primitives are accessed in a black-box manner. 

We further recall the work of Choi et al. [11] that shows that the weakest general 
known assumption that is required to construct adaptively secure semi-honest 
oblivious transfer is trapdoor simulatable PKE. Now, since such an encryption 
scheme admits PKE with oblivious ciphertext generation, we obtain the follow- 
ing corollary that unifies the two assumptions required to achieve adaptive UC 
security. 

Corollary 15. Assuming the existence of (trapdoor) simulatable PKE, then any 
functionality can be realized in the CRS model with adaptive UC security and 
black-box access to the PKE. 

3 We note that while in the plain model any statically secure protocol can be compiled 
into one-sided secure protocol by encrypting its entire communication using one- 
sided NCE, it is not the case in the UC setting due to the additional setup. 


On Black-Box Complexity of Universally Composable Security 191 


An additional interesting observation that is implied by our work is that 
our UC commitment scheme implies a construction that is secure in the adap- 
tive setting when erasures are allowed, and under the weaker assumption of 
PKE. Specifically, instead of obliviously sampling ciphertexts in the commit- 
ment phase, the committer encrypts arbitrary plaintexts and then erases the 
plaintexts and randomness used for these computations. Our proof follows eas- 
ily for this case as well. Combining our UC commitment scheme together with 
the semi- honest with erasures OT from [41] and the transformation of [12], we 
obtain the following result 

Theorem 16 (Informally). Assuming the existence of PKE and semi-honest 
oblivious transfer secure against an adaptive adversary assuming erasures, then 
any functionality can be realized in the CRS model with adaptive UC security 
assuming erasures, where the underlying primitives are accessed in a black-box 
manner. 

Noting that OT secure against adaptive adversaries assuming erasures can be 
realized under assumptions sufficient for achieving the same with respect to the 
weaker static adversaries, this theorem shows that achieving UC security against 
adaptive adversaries in the presence of erasures does not require any additional 
assumption beyond what is required to secure against static adversaries. 

Implications. Next, we specify a sequence of interesting observations that are 
implied by our result in the adaptive UC setting. 

- Previously, Dachman-Soled et al. [14], showed that adaptive UC secure proto- 
cols can be constructed in the CRS model assuming the existence of simulat- 
able PKE. Our result improves this result in terms of complexity assumptions 
by showing that trapdoor simulatable PKE is sufficient, and provides new 
constructions based on concrete assumptions that were not known before. 
Nevertheless, we should point out that while the work of Dachman-Soled et 
al. is constructed in the global CRS model using a non-black-box construc- 
tion, our result provides a black-box construction in a CRS model where the 
length of the reference string is linear in the number of parties. 

- Analogous to our result on static UC security, it is possible to extend this result 
to the chosen key-registration authority (KRA) model, where we assume the 
existence of a trusted-party that samples public keys and secret keys for each 
party, and broadcasts the public key to all parties. 

- Importantly, this result provides the first evidence that adaptively secure UC 
commitment is theoretically easier to construct than stand-alone adaptively 
secure semi- honest oblivious transfer. This is due to a separation from [43] 
(regarding static vs. adaptive oblivious transfer), that proves that adaptive 
oblivious transfer requires a stronger hardness assumption than enhanced 
trapdoor permutation. 

- Regarding concrete assumptions, previously, adaptive UC commitments with- 
out erasures were constructed based on TV-residuosity and p-subgroup hard- 
ness assumptions [17] and Strong RSA [15]. On the other hand, our result 
demonstrates the feasibility of this primitive under DDH, LWE, factoring 
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Blum integers and RSA assumptions. When considering adaptive corruption 
with erasures, the work of Blazy, et al. [3], extending the work of Lindell [42], 
shows how to construct highly efficient UC commitments based on the DDH 
assumption. On the other hand, assuming erasures, we are able to construct 
an adaptive UC commitment scheme based on any CPA-secure PKE. 

2 Preliminaries 

We denote the security parameter by n. We use the abbreviation PPT to denote 
probabilistic polynomial-time. We further denote by a <— A the random sampling 
of a from a distribution A , and by [n\ the set of elements {1, . . . , n}. 

Definition 21 (PKE with Oblivious Ciphertext Generation [16]). A PKE 
II with oblivious sampling generation is defined by the tuple (Gen, Enc, Dec, 

Enc, Enc ) and has the following additional property, 

- Indistinguishability of Oblivious and Real Ciphertexts. For any mes- 
sage m in the appropriate domain , consider the experiment (PK, SK) 

Gen(l ra ), ci <- Enc PK (ri), c 2 <- Enc PK (TO;r 2 ), r[ <- Enc PK (c 2 ). 

Then, (PK, r[,ci,m) ss (PK, r 2 , c 2 , to). 

To this end, we only employ PKE with perfect decryption. This merely simplifies 
the analysis and can be relaxed by using PKE with a negligible decryption error 
instead. 


2.1 Oblivious Transfer 

l-out-of-2 oblivious transfer (OT) is an important functionality in the context 
of secure computation that is engaged between a sender Sen and a receiver Rec; 
see Fig. 1 for the description of functionality F OT . In this paper we are interested 
in reducing the hardness assumptions for general UC secure computation when 
using only black-box access to the underlying cryptographic primitives, such as 
the semi- honest OT. We use semi-honest OT as a building block for designing 
UC secure protocols in both static and adaptive settings. In the static setting, 
we refer to the two-round protocol of [21] that is based on PKE with oblivi- 
ous ciphertext generation (or enhanced trapdoor permutation). In the adaptive 
setting, we refer to the two-round protocol of [9] that is based on augmented 
non-committing encryption scheme. 

We next recall that any two-round semi-honest OT implies PKE. We demon- 
strate that in two phases, starting with the claim that semi-honest OT implies 
a key agreement (KA) protocol, where two parties agree on a secret key over 
a public channel. This statement has already been proven in [23] in the static 
setting, and holds for any number of rounds. The idea is simple, the parties 
execute an OT protocol where the party that plays the sender picks two random 
inputs sq, s h whereas the party that plays the receiver enters 0. Finally, the par- 
ties output so and security follows from the correctness and privacy of the OT. 
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A simple observation shows that this reduction also holds in the adaptive setting. 
Namely, starting with an adaptive semi-honest OT, the same reduction implies 
an adaptively secure KA (where the protocol communication must be consis- 
tent with respect to any key). Note that this reduction preserves the number of 
rounds, thus if the starting point is a two-round OT then the reduction implies 
a two-round KA. Next, a well established fact shows that in the static setting a 
two-round key agreement implies PKE (in fact, these primitives are equivalent). 
Formally, 

Theorem 22. Assume the existence of two-round key agreement protocol with 
static security, then there exists IND-CPA PKE. 


Functionality P OT 

Functionality P OT communicates with with sender Sen and receiver Rec, and adversary 

S. 

1. Upon receiving input (sender, sid, vq, v\) from Sen where vq,v\ E {0,1}*, 
record (sid, vo,vi). 

2. Upon receiving (receiver, sid , u) from Rec, where a tuple ( sid , vo,vi) is recorded 
and u E {0, 1}, send (sid, v u ) to Rec and sid to S. Otherwise, abort. 


Fig. 1 . The oblivious transfer functionality. 


Sender Private Oblivious Transfer. Sender privacy is a weaker notion than 
malicious security and only requires that the receiver’s input be hidden even 
against a malicious sender. It is weaker than malicious security in that it does 
not require a simulation of the malicious sender that extracts the sender’s inputs. 
In particular, we will only require that a malicious sender cannot distinguish the 
cases where the receiver’s input is 0 or 1. Formally stated, 

Definition 23 (Sender Private OT). Let it be a two-party protocol that is 
engaged between a sender Sen and a receiver Rec. We say that it is a sender 
private oblivious transfer protocol, if for every PPT adversary A that corrupts 
Sen, the following ensembles are computationally indistinguishable: 

- {View Aw [„4(l n ),Rec(l ra ,0)]} neN 

- {View A?r [.4(l n ),Rec(l n ,l)]} raeN 

where View^4 57r [w4(l n ), Rec(l n , b)] denotes A’s view within i r whenever the 
receiver Rec inputs the bit b. 

We point out that sender privacy protects the receiver against a malicious sender 
and should be read as privacy against a malicious sender. 
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Defensibly Private Oblivious Transfer. The notion of defensible privacy 
was introduced by Haitner in [26,27]. A defense in a two-party protocol 7 r = 
(Pi, P2) execution is an input and random tape provided by the adversary after 
the execution concludes. A defense for a party controlled by the adversary is 
said to be good , if this party participated honestly in the protocol using this very 
input and random tape, then it would have resulted in the exact same messages 
that were sent by the adversary. In essence, this defense serves as a proof of 
honest behavior. It could very well be the case that an adversary deviates from 
the protocol in the execution but later provides a good defense. The notion of 
defensible privacy says that a protocol is private in the presence of defensible 
adversaries if the adversary learns nothing more than its prescribed output when 
it provides a good defense. 

We informally describe the notion of good defense for a protocol 7 r; we refer 
to [27] for the formal definition. Let trans = (#i, ai, . . . , qi, a^) be the transcript 
of an execution of a protocol 7 r that is engaged between Pi and P2 and let A 
denote an adversary that controls Pi , where qi is the ith message from Pi and 
is the ith message from P2 (that is, a i is the response for qi). Then we say that 
(x,r) constitutes a good defense of A relative to trans if the transcript generated 
by running the honest algorithm for Pi with input x and random tape r against 
P2 s messages a\, . . . , ag results trans. 

The notion of defensible privacy can be defined for any secure computation 
protocol. Nevertheless, since we are only interested in oblivious transfer proto- 
cols, we present a definition below that is restricted to oblivious transfer proto- 
cols. The more general definition can be found in [27]. At a high-level, an OT 
protocol is defensibly private with respect to a corrupted sender if no adversary 
interacting with an honest receiver with input b should be able to learn b, if at the 
end of the execution the adversary produces any good defense. Similarly, an OT 
protocol that is defensibly private with respect to malicious receivers requires 
that any adversary interacting with an honest sender with input (so,Si) should 
not be able to learn 5i_^, if at the end of the execution the adversary produces 
a good defense with input b. Below we present a variant of the definition pre- 
sented in [27]. We stress that while the [27] definition only considers bit OT (i.e. 
sender’s inputs are bits) we consider string OT. 

Definition 24 (Defensible-Private String OT). Let tt be a two-party proto- 
col that is engaged between a sender Sen and a receiver Rec. We say that tt is a 
defensibly-private string oblivious transfer protocol, if for every PPT adversary 
A the following holds, 

1 . (r , (View_4[^l(l n ), Rec(l”, U)],U)} w {r(View^[^(l"),Rec(l",^)],^)}, 

where T(v, *) is set to (v, *) if following the execution A outputs a good defense 
for 7 t, and _L otherwise, and U and U' are independent random variables uni- 
formly distributed over {0, 1}. This property is referred to as defensibly private 
with respect to a corrupted sender. 

2. {/ 1 (View^[Sen(l n , (£7 0 ", U?)),A(l n )],U?_ b )} £ {r(View^[Sen(l™, (£7 0 ", 

Uf)), A(l n )],U n )} where T(v,*) is set to (v,*) if following the execution 
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A outputs a good defense for tt, and _L otherwise , b is the Rec’s input in 

this defense and Uq, Vf , U n are independent random variables uniformly dis- 
tributed over {0, l} n . This property is referred to as defensibly private with 

respect to a corrupted receiver. 

In our construction from Sect. 3, we will rely on an OT protocol that is 
sender private and defensibly private with respect to a corrupted receiver. In [27], 
Haitner et al. showed how to transform any semi-honest bit-OT to one that is 
defensibly private with respect to a corrupted receiver and malicious secure with 
respect to a corrupted sender. More formally, the following Lemma is implicit in 
the work of [27]. 

Lemma 21 (Implicit in Theorem 4.1 and Corollary 5.3 [27]). Assume 
the existence of a semi- honest oblivious transfer protocol 7 r. Then there exists 
an oblivious transfer protocol n that is defensible-private with respect to the 
receiver and sender private that relies on the underlying primitive in a black- 
box manner. 

Now, since sender privacy is implied by malicious security with respect to a cor- 
rupted sender, this transformation yields a bit OT protocol with the required 
security guarantees. Nevertheless, our protocol crucially relies on the fact that 
the underlying OT is a string OT protocol. We therefore show in the full ver- 
sion [30] how to transform any bit OT to a string OT protocol while preserving 
both defensible private with respect to a maliciously corrupted receiver and 
sender privacy. 

At a high-level, in order to convert any protocol from semi-honest security to 
defensible privacy, Haitner et al. include a coin-tossing stage at the beginning of 
the protocol that determines the parties’ random tapes. In fact, they let the coin- 
tossing also determine the parties inputs as they only require OT secure with 
respect to random inputs for both the sender and receiver. Now, if the receiver 
has to provide a good defense, then it must reveal the input and randomness 
used for the semi-honest OT protocol and prove consistency relative to the values 
generated in the coin-tossing stage. Due to the fact that the commitment schemes 
that are used in the coin-tossing stage are statistically-binding, the probability 
that a malicious receiver can deviate from the protocol and provide a good 
defense is negligible. Using this fact, Haitner et al. argued that the probability 
that a malicious receiver outputs a good defense and guesses the other sender’s 
input is negligible. Next, to obtain sender private oblivious transfer they first 
transformed an OT protocol that is defensible-private against malicious receivers 
to one that is maliciously secure, and then exploited the symmetry of OT in order 
to obtain a protocol that is sender-private. The first transformation relies on the 
cut-and-choose approach to ensure that the receiver provides a valid defense, 
and then using the fact that defensible privacy hides the sender’s other input 
they argued that it is receiver-private. 

2.2 UC Commitment Schemes 

The notion of UC commitments was introduced by Canetti and Fischlin in [7]. 
The formal description of functionality *F CO m is depicted in Fig. 2. 
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Functionality T c om 

Functionality J C om communicates with with sender Sen and receiver Rec, and adver- 
sary S. 

1. Upon receiving input (commit, sid, m) from Sen where m G {0,1}*, in- 
ternally record ( sid , m) and send message (sid, Sen, Rec) to the adversary. 
Upon receiving approve from the adversary send sid , to Rec. Ignore subsequent 
(commit, ., ., .) messages. 

2. Upon receiving (reveal, sid) from Sen, where a tuple (sid, m) is recorded, send 
message m to adversary S and Rec. Otherwise, ignore. 


Fig. 2. The string commitment functionality. 


2.3 Extractable Commitments 

Our result in the static setting requires the notion of (static) extractable UC 
commitments, which is a weaker security property than UC commitments in the 
sense that it does not require equivocality. In what follows, we introduce the 
definition for the ideal functionality T EXT com from [44]. Towards introducing 
this definition, Maji et al. introduced some notions first. More concretely, 

Definition 25. A protocol is a syntactic commitment protocol if: 

- It is a two phase protocol between a sender and a receiver (using only plain 
communication channels). 

- At the end of the first phase (commitment phase), the sender and the receiver 
output a transcript trans. Furthermore, the sender receives an output (which 
will be used for opening the commitment). 

- In the decommitment phase the sender sends a message 7 to the receiver, who 
extracts an output value opening(trans, 7) G { 0 , l} n U {T}. 

Definition 26. Two syntactic commitment protocols (w>l,ujr) form a pair of 
complementary statistically binding commitment protocols if the following hold: 

- cvr is a statistically binding commitment scheme (with stand-alone security). 

- In lvl, at the end of the commitment phase the receiver outputs a string z G 
{0, l} n . If the receiver is honest, it is only with negligible probability that there 
exists 7 such that opening(trans, 7) 7^ _L and opening(trans, 7) 7^ z. 

As noted in [44], ujl by itself is not an interesting cryptographic goal, as the 
sender can simply send the committed string in the clear during the commitment 
phase. Nevertheless, in defining T EXT com below, there exists a single protocol 
that satisfies both the security guarantees. We are now ready to introduce the 
notion of extractable commitments in Fig. 3 that is parameterized by (c ol,ujr). 
We also include a function pp that will be used as an initialization phase to set 
up the public-parameters for uvl and ujr. 
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Functionality Jextcom parameterized by (pp, c jl,wr) 

P extcom is running with parties Pi,...,P n and an adversary S : Upon receiving a 
message (i n it — commit, sid, ssid, Pi, Pj) from Pi, it first checks if there is a tuple 
(public — params, sid , Pi , (pp, sp)). If yes, it sends (init — commit, szd, ssid, Pi , Pj) to Pj. If 
not, it runs (pp, sp) pp(l n ) and sends (init — commit, sid , Pi,pp) to Pi, Pj and <S. It stores 
(public — params, sid, Pi , (pp, sp)). We denote Pi by the sender and Pj by the receiver in this 
interaction. Next, the functionality behaves as follows, depending on which party is corrupted. 

- Pi IS HONEST AND Pj IS HONEST. 

Commit Phase: Upon receiving (commit, sid, ssid, Pi, Pj, m) from Pi, it internally 
simulates a session of ujr (simulating both the sender and receiver in ljr), with 
the sender’s input fixed to m. It gives (transcript, sid, ssid, tra ns, 7) to Pi and 
(receipt, sid , ssid , Pi, Pj, trans) to Pj and S. 

Reveal Phase: Upon receiving (decommit, sid, ssid, •) from the sender, it sends 
(decommit, sid, ssid, Pi, Pj, z) to Pj and S. 

- Pi IS CORRUPTED AND Pj IS HONEST. 

Commit Phase: It runs the commitment uj l with the sender, playing the part of the receiver 
in ujl, to obtain (sid, ssid, trans, z). It sends (receipt, sid, ssid, Pi, Pj, trans) to Pj 
and S. 

Reveal Phase: Upon receiving (decommit, sid, ssid, 7) from the sender, if 
opening(trans, 7) = z, it sends (decommit, sid, ssid, Pi, Pj, z) to Pj and S. 

Otherwise ignore. 

- Pi IS HONEST AND Pj IS CORRUPT. 

Commit Phase: Upon receiving (commit, sid, ssid, Pi, Pj, m) from Pi, it runs the com- 
mitment phase of ujr with Pj , playing the sender’s role in ujr with m as input. It obtains 
the output (trans, 7) at the end of this phase, and sends (transcript, sid, ssid, trans, 7) 
to Pi. 

Reveal Phase: Upon receiving (decommit, sid, ssid) from the sender it sends 
(decommit, sid, ssid, Pi, Pj, (7, z)) to Pj and S. 

The functionality does not do anything when both the sender and the receiver are corrupted. 


Fig. 3. Extractable commitment functionality. 


Implementing ^extcom in the CRS Model. We briefly sketch how to imple- 
ment the extractable commitment functionality in the .F CRS -hybrid based on the 
CPA-security of any PKE. Namely, the CRS will be set to a public-key gener- 
ated using the key- generation function of the PKE scheme. To commit, a sender 
simply encrypts the message using the public-key in the CRS and sends the 
ciphertext to the receiver. We can achieve extraction by setting the CRS to a 
public-key for which the secret-key is available to the extractor (in this case, the 
extractor is the ^extcom functionality). Hiding follows from the CPA-security of 
the encryption scheme. A formal description and proof of this construction can 
be found in the full version of this paper [30] . 
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3 Static UC Secure Computation 

In this section we prove the feasibility of UC secure computation based on semi- 
honest OT and extractable commitments, where the latter can be constructed 
based on two-round semi-honest OT (see Sects. 2.1 and 2.3 for more details). 
More concretely, we prove how to transform any statically semi-honest secure 
OT into one that is secure in the presence of malicious adversaries, giving only 
black-box access to the underlying semi- honest OT protocol. Our protocol is 
a variant of the protocol by Lin and Pass from [37] (which in turn is a vari- 
ant of the protocol of [27]). In particular, in [37], the authors rely on a strong 
variant of a commitment scheme known as a CCA-secure commitment in order 
to achieve extraction. We observe that it is not required to use the full power 
of such commitments, or for that matter UC commitments. Specifically, using 
a weaker primitive that only implies straight-line input extractability enables 
to solely rely on semi- ho nest OT. An important weakening in our commitment 
scheme compared to CCA-secure commitments from [37] is that we allow invalid 
commitments to be made by the adversary. We remark here that the work of [37] 
rely on string OT that are secure against malicious senders and state that the 
work of [26] provides a black-box construction of such a protocol starting from 
a semi- honest bit OT. However, the work of [26] only shows how to construct 
a bit OT secure against malicious senders where the proof crucially relies on 
the sender’s input being only bits. We provide a transformation and complete 
analysis from bit OT to a string OT for the weaker notion of defensible privacy 
as this is sufficient for our work. Finally, combining our UC OT protocol with 
the [33] protocol, we obtain a statically UC secure protocol for any well-formed 
functionality (see definition in [9]). Namely, 

Theorem 31. Assume the existence of static semi-honest oblivious transfer. 
Then for any multi-party well-formed functionality T , there exists a protocol that 
UC realizes T in the presence of static, malicious adversaries in the f EXT coM- 
hybrid model using black-box access to the oblivious transfer protocol. 

We remark here that the work of [12] shows how starting from a semi- honest 
oblivious transfer it is possible to obtain a black-box construction of an OT 
protocol that is secure against stand-alone static adversaries in the jF COM -hybrid 
model. It is noted in [12] that the (high-level) analysis provided in the work might 
be extendable to the UC-setting (cf. Footnote 10 in [12]). Furthermore, in the 
static setting, it is conceivable that f COM can be directly realized in the f EXT coM- 
hybrid using the notion of extractable trapdoor commitments [47]. We do not 
pursue this approach and instead directly realize OT in the Textcom' - hybrid. 
While the previous works of [12] and [27] require a three step transformation, 
our transformation is one shot and therefore more direct. 

It seems possible to generalize our theorem to multi-session functionalities. 
Analogous to [7] , this will allows us to extend our corollaries to the Global CRS 
model by additionally assuming CCA encryption scheme and leave it as future 
work. 


On Black-Box Complexity of Universally Composable Security 199 


3.1 Static UC Oblivious Transfer 

In the following, we discuss a secure implementation of the oblivious transfer 
functionality (see Fig. 1) with static, malicious security in the .F EXT coM-hybrid 
model (where *F EXTCO m is stated formally in Fig. 3). Our goal in this section 
is to show that the security of malicious UC OT can be based on UC semi- 
honest OT, denoted by 7 Tq^, and extractable commitments. Our result is shown 
in two phases. At first we compile the semi- honest OT protocol into a new 
protocol with the security properties that are specified in Sect. 2 . 1 , extending 
the [27] transformation into string OT; denote the compiled OT protocol by 
7 Tot- Next, we use tt ot in order to construct a new protocol i that is secure 
in the presence of malicious adversaries. Details follow, 

Protocol 1 (Protocol 7 Tqt with Static Security) 

Input: The sender Sen has input (vo,vi) where vq,v\ G {0, l} n and the receiver Rec 
has input u G {0, 1}. 

The protocol: 

1. Coin Tossing: 

- Receiver’s random tape generation: The parties use a coin tossing protocol in 
order to generate the inputs and random tapes for the receiver. 

• The receiver commits to 20 n strings of appropriate length, denoted by 
a Rec> • • • 5 a Rec; by sending Textcom the message (commit, sid , ssidi , aR ec ) 
for all i G [n\. 

• The sender responds with 20 n random strings of appropriate length 

h 1 h 20n 

• The receiver computes r*R ec = nR ec ©^Rec an d then interprets r^ ec = Ci\\r^ ec 
where Ci determines the receiver’s input for the i th OT protocol, whereas r^ ec 
determines the receiver’s random tape used for this execution. 

- Sender’s random tape generation: The parties use a coin tossing protocol in 
order to generate the inputs and random tapes for the sender. 

• The sender commits to 20 n strings of appropriate length, denoted by 
a Sen> • • • > a Sen; by sending Textcom the message (commit, sid, ssid^, a z Sen ) 
for all i G [n\. 

• The receiver responds with 20 n random strings of appropriate length 

h 1 h 20n 

• The sender computes rg en = a z Sen © b l Sen and then interprets r$ en = 
s°||sj|| 7 g en where (s°,sj) determine the sender’s input for the i th OT pro- 
tocol, whereas Tg en determines the sender’s random tape used for this exe- 
cution. 

2. Oblivious Transfer: 

- The parties participate in 20 n executions of the OT protocol 7t 0 t with the corre- 
sponding inputs and random tapes obtained from Stage 2. Let the output of the 
receiver in the i th execution be Si. 

3. Cut-and-choose: 

- Sen chooses a random subset qsen = (#sen> • • • , <Zsen) £ {1, . . . , 20} n and sends it 
to Rec. The string qs e n is used to define a set of indices Ts e n C {1, . . . , 20n} of 
size n in the following way: Ts e n = {20z — ^s en }ie[n] • The receiver then opens the 
commitments from Stage 1 that correspond to the indices within Ts e n, namely, 
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the receiver decommits a^ ec for all i G /sen- Sen checks that the decommitted 
values are consistent with the inputs and randomness used for the OTs in Stage 2 
by the receiver, and aborts in case of a mismatch. 

- Rec chooses a random subset q^ec = (gR ec , • • • , ^R ec ) £ {1, • • • , 20} n and sends it 
to Sen. The string (/R ec is used to define a set of indices /Rec C {1, . . . , 20n} of 
size n in the following way: /Rec = {20 i — qh ec }ie[n\ • The sender then opens the 
commitments from Stage 1 that correspond to the indices within /R ec , namely, 
the sender decommits a z Sen for all i G /Rec- Rec checks that the decommitted 
values are consistent with the inputs and randomness used for the OTs in Stage 2 
by the sender, and aborts in case of a mismatch. 

- Rec commits to another subset T C [20n] denoted by (T 1 , . . . ,T n ), by sending 
/extcom the message (commit, sid, ssid[, T l ) for all i G [n\. (The sender will 
reveal its inputs and randomness that are used in Stage 2 that correspond to the 
indices in T later in Stage 5.) 

4 . Combiner: 

- Let A = [20 n\ — /R ec — /sen- Then for every i G A, the receiver computes 
ol% — r ® Ci and sends it to the sender. 

- The sender computes a 10n-out-of-18n secret sharing of v o, denote the shares by 
{pi}ieA- Analogously, it computes a 10n-out-of-18n secret sharing of v 1 , denote 
the shares by {p\}ieA- The sender computes (3\ — p\ ® s h ® ai for all b G {0, 1} 
and i G A, and sends the outcome to the receiver. 

- The receiver computes pi — f3f ® Si for alii G A. Denote by p these concatenated 
bits. 

5. Final cut-and-choose: 

- The receiver decommits T and the sender sends the inputs and randomness it 
used in Stage 2 for the coordinates that correspond to A D T. ( Note that the 
sender need only reveal the indices that were not decommitted in Stage 3). Rec 
checks that the sender’s values are consistent with the inputs and randomness 
used for the OTs in Stage 2 by the sender, and aborts in case of a mismatch. 

- The receiver checks whether (pi)ieA agrees with some codeword w G Wi 8 n,i 0 n 
on 17 n locations (where the code Wi 8 n,i 0 n is induced by the secret sharing con- 
struction that we use in Stage 4)- Recall that the minimum distance of the code 
Wi 8 n,i 0 n is at least 18 n — 10 n > 8 n, which implies that there will be at most one 
such codeword w. Furthermore, since we can correct up to 18n ~ 10n — 4 n errors, 
any code that is 17 n close to a codeword can be efficiently recovered using the 
Berlekamp- Welch algorithm. The receiver outputs that w as its output in the 
OT protocol. If no such w exists, the receiver returns a default value. 

Theorem 32. Assume that 7r^ is static semi-honest secure and that the com- 
piled 7t 0 t is secure according to Lemma 21. Then Protocol 1 UC realizes / OT in 
the presence of static malicious adversaries in the /extcom -hybrid model using 
black-box access to the oblivious transfer protocol. 

Recalling that our protocol relies on the existence of semi-honest OT and 
extractable commitments, and that the later can be constructed based on any 
two-round semi-honest OT, e.g., [21], which implies PKE (see Sects. 2.1 and 2.3 
for more details), an immediate corollary from Theorem 32 implies that, 

Corollary 33. Assume the existence of two-round static semi-honest oblivious 
transfer. Then there exists a protocol that securely realizes ff OT in the presence 
of static malicious adversaries in the CRS model using black-box access to the 
oblivious transfer protocol. 
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A High Level Proof. We first provide an overview of the security proof; the 
complete proof is found in [30] . Loosely speaking, in case the receiver is corrupted 
the simulator plays the role of the honest sender in Stages 1-4. Next in Stage 5, 
the simulator extracts the receiver’s input u. Specifically, the simulator extracts 
all the committed values of the receiver within Stage 1 (relying on the fact that 
the commitment scheme is extractable), and then uses these values in order to 
obtain the inputs for the OT executions in Stage 2. Upon completing Stage 2, 
the simulator records the coordinates for which the receiver deviates from the 
prescribed input and random tape chosen in the coin tossing phase. Denoting 
these set of coordinates by we recall that a malicious receiver may obtain 
both of the sender’s inputs with respect to the OT executions that correspond 
to the coordinates within & and T. On the other hand, it obtains only one of 
the two inputs with respect to the rest of the OT executions that correspond 
to the coordinates within A — <P — T. Consequently, the simulator checks how 
many shares of Vo and v\ are obtained by the receiver and proceeds accordingly. 
In more details, 

- If the receiver obtains more than lOn shares of both inputs then the simulator 
halts and outputs fail (we prove in Section [30] that this event only occurs 
with negligible probability). 

- If the receiver obtains less than lOn shares of both inputs then the simulator 
picks two random values for vq and v\ of the appropriate length and completes 
the interaction, playing the role of the honest sender on these values. Note 
that in this case the simulator does not need to call the ideal functionality. 

- Finally, if the receiver obtains more than lOn shares for only one input u G 
{0, 1}, then the simulator sends u to the ideal functionality J r OT and obtains 
v u . The simulator then sets v\- u as a random string of the appropriate length 
and completes the interaction by playing the role of the honest sender on these 
values. 

Recall that the only difference between the simulation and the real execution is 
in the way the messages in Stage 4 are generated. Specifically, in the simulation 
a value u is extracted from the malicious receiver and then fed to the f OT 
functionality. The simulation is then completed based on the output returned 
from the functionality. Intuitively, the cut-and-choose mechanism ensures that 
the receiver cannot deviate from the honest strategy in Stage 2 in more than n 
OT sessions without getting caught with overwhelming probability. Moreover, 
the defensible privacy of the OT protocol implies that the receiver can learn at 
most one of the two inputs of the sender relative to the OT executions in Stage 2 
for which the receiver proceeded honestly. 

In case the sender is corrupted, the simulator’s strategy is to play the role 
of the honest receiver until Stage 5 where the simulator extracts the sender’s 
inputs. More specifically, the simulator first extracts the sender’s input for the 
OT executions in Stage 1 (relying on the fact that the commitment scheme is 
extractable). Next, the simulator extracts the shares {p^}i^A and {p\}i^A that 
correspond to inputs and v \ . To obtain the actual values the simulator checks 
if these shares agree with some codeword relative to 16n locations. That is, 
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- Let wq and w\ denote the corresponding codewords (if there are no such 
codewords that agree with with vq and v\ on 16n locations then the simulator 
uses a default codeword instead). Next, the simulator checks and w\ against 
the final cut-and-choose. If any of the shares from Wb are inconsistent with 
the opened shares that are opened by the sender in the final cut-and-choose, 
then Vb is set to a default value, otherwise v b is the value corresponding to 
the shared secret. 

Finally, the simulator sends (r>o,^i) to the ideal functionality for T ot . Security 
in this case is reduced to the privacy of the receiver. In addition, the difference 
between the simulation’s strategy and the honest receiver’s strategy is that the 
simulator extracts the sender’s both inputs in all i E A — <3? and then finds 
codewords that are 16n-close to the extracted values, whereas the honest receiver 
finds a codeword that is 17n-close based on the inputs it received in the Stages 2 
and 5, and returns it. We thus prove that the value u extracted by the simulator 
is identical the to the reconstructed output of the honest receiver relying on the 
properties of the secret sharing scheme. 

4 One-Sided Adaptive UC Secure Computation 

In the two-party one-sided adaptive setting, at most one of the parties is adap- 
tively corrupted [29,35]. In this section we provide a simple transformation of 
our static UC secure protocol from Sect. 3 to a two-party UC-secure protocol 
that is secure against one-sided adaptive corruption. Our first observation is 
that in Protocol 1 the parties use their real inputs to the OT protocol only in 
Phase 4. Therefore simulation of the first three phases can be easily carried out 
by simply following the honest strategy. On the other hand, simulating messages 
in Phase 4 requires some form of equivocation since if corruption takes place 
after this phase is concluded then the simulator needs to explain this message 
with respect to the real input of the corrupted party. On a high-level we will 
transform the protocol so that if no party is corrupted until end of Phase 4, the 
simulator can equivocate the message in Phase 4. We explain how to achieve 
equivocation later. First, we describe our simulator: In case either party is stat- 
ically corrupted the simulation for Protocol 1 follows the strategy of the honest 
party until Phase 4, where the simulator extracts the corrupted party’s input 
relying on the fact that it knows the adversary’s committed input in Phase 1. 
Therefore, the same proof follows in case the adversary adaptively corrupts one 
of the parties at any point before Phase 4, as the simulator can pretend that cor- 
ruption took place statically. On the other hand, if corruption takes place after 
Phase 4, then the simulator equivocates the communication. It is important to 
note that while in the plain model any statically secure protocol can be compiled 
into one-sided secure protocol by encrypting its entire communication, it is not 
clear that this is the case in the UC setting due to the additional setup, e.g., 
a CRS that may depend on the identity of the corrupted party. Nevertheless, 
in Phase 4 the parties only run a combiner for which the computation does not 
involve any usage of the CRS (which is induced by the extractable commitment). 
Therefore, the proof follows. 
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A common approach to achieve equivocation is to rely on non-committing 
encryption schemes (NCE) [6,11,16], that allow secure communication in the 
presence of adaptive attacks. This powerful tool has been constructed while 
relying on (a variant of) simulatable PKE schemes, which, roughly speaking, 
allows for both the public-key and the ciphertexts to be generated obliviously 
without the knowledge of the plaintext or the secret key [11,16]. Notably, these 
constructions achieve a stronger notion of security where both parties may be 
adaptively corrupted (also referred to as fully adaptive). Our second observation 
is that it is sufficient to rely on a weaker variant of NCE, namely, one that is 
secure against only one-sided adaptive corruption. 

In particular, we take advantage of a construction presented in [6] and later 
refined in [16], that achieves receiver equivocation under the assumption of semi- 
honest OT. We will briefly describe it now. Recall that in the fully adaptive case, 
the high-level idea is for the sender and receiver to mutually agree on a random 
bit, which is then used by the sender to determine which of two random strings 
to mask its message. The process of agreeing on a bit requires the ability to 
both obliviously sample a public-key without the knowledge of the secret key, 
as well as the ability to obliviously sample a ciphertext without the knowledge 
of the corresponding plaintext. In the simpler one-sided scenario, Canetti et al. 
observed that an oblivious transfer protocol can replace the oblivious generation 
of the public- key. Specifically, the NCE receiver sends two public keys to the 
sender, and then the parties invoke an OT protocol where the NCE receiver 
plays the role of the OT sender and enters the corresponding secret keys. To allow 
equivocation for the NCE sender, the OT must enable equivocation with respect 
to the OT receiver. The [21] OT protocol is an example for such a protocol. Here 
the OT receiver can pick the two ciphertexts so that it knows both plaintexts. 
Then equivocation is carried out by declaring that the corresponding ciphertext 
is obliviously sampled. 

The advantage of this approach is that it removes the requirement of gen- 
erating the public key obliviously, as now the randomness for its generation is 
split between the parties, where anyway only one of them is corrupted. This 
implies that the simulator can equivocate the outcome of the protocol execu- 
tion without letting the adversary the ability to verify it. To conclude, it is 
possible to strengthen the security of Protocol 1 into the one-sided setting by 
simply encrypting the communication within the combiner phase using one-sided 
NCE which in turn can be constructed based on PKE with oblivious ciphertext 
generation. This implies the following theorem which further implies black-box 
one-sided UC secure computation from enhanced trapdoor permutation. 

Theorem 41. Assume the existence of PKE with oblivious ciphertext genera- 
tion. Then for any two-party well-formed functionality T , there exists a protocol 
that UC realizes T in the presence of one-sided adaptive, malicious adversaries 
in the CRS model using black-box access to the PKE. 
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5 Adaptive UC Secure Computation 

In this section we demonstrate the feasibility of UC secure commitment schemes 
based on PKE with oblivious ciphertext generation (namely, where it is possible 
to obliviously sample the ciphertext without knowing the plaintext). Our con- 
struction is secure even in the presence of adaptive corruptions and is the first to 
achieve the stronger notion of adaptive security based on this hardness assump- 
tion. Plugging-in our UC commitment protocol into the transformation of [12] 
that generates adaptive malicious OT given adaptive semi-honest OT and UC 
commitments, implies an adaptively UC secure oblivious transfer protocol with 
malicious security based on semi-honest adaptive OT and PKE with oblivious 
ciphertext generation using only black-box access to the semi-honest OT and 
the PKE. Stating formally, 

Theorem 51. Assume the existence of adaptive semi-honest oblivious transfer 
and PKE with oblivious ciphertext generation. Then for any multi-party well- 
formed functionality T , there exists a protocol that UC realizes T in the presence 
of adaptive, malicious adversaries in the CRS model using black-box access to 
the oblivious transfer protocol and the PKE. 

Noting that simulatable PKE implies both semi-honest adaptive OT [9,11] and 
PKE with oblivious ciphertext generation, we derive the following corollary 
(where simulatable PKE implies oblivious sampling of both public keys and 
ciphertexts) , 

Corollary 52. Assume the existence of simulatable PKE. Then for any multi- 
party well-formed functionality T , there exists a protocol that UC realizes T in 
the presence of adaptive, malicious adversaries in the CRS model using black-box 
access to the simulatable PKE. 

This in particular improves the result from [14] that relies on simulatable PKE in 
a non-black-box manner. Note also that our UC commitment can be constructed 
using a weaker notion than simulatable PKE where the inverting algorithms can 
require a trapdoor. This notion is denoted by trapdoor simulatable PKE [11] and 
can be additionally realized based on the hardness assumption of factoring Blum 
integers. This assumption, however, requires that we modify our commitment 
scheme so that the CRS includes 3n+l public keys of the underlying PKE instead 
of just one, as otherwise the reduction to the security of the PKE does not follow 
for multiple ciphertexts. Specifically, at the cost of linear blowup (in the security 
parameter) of the CRS, we obtain adaptively secure UC commitments under 
a weaker assumption. Now, since trapdoor simulatable PKE implies adaptive 
semi- ho nest OT [11] it holds, 

Corollary 53. Assume the existence of trapdoor simulatable PKE. Then for any 
multi-party well-formed functionality T, there exists a protocol that UC realizes 
T in the presence of adaptive, malicious adversaries in the CRS model using 
black-box access to the trapdoor simulatable PKE. 
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Note that, since the best known general assumptions for realizing adaptive semi- 
honest OT is trapdoor simulatable PKE, this corollary gives evidence that the 
assumptions for adaptive semi- honest OT are sufficient for adaptive UC security 
and makes a step towards identifying the minimal assumptions for achieving UC 
security in the adaptive setting. To conclude, we note that enhanced trapdoor 
permutations, which imply PKE with oblivious ciphertext generation, imply the 
following corollary, 

Theorem 54. Assume the existence of enhanced trapdoor permutation. Then 
f C oM (cf. Fig. 2) can be UC realized in the CRS model in the presence of adaptive 
malicious adversaries. 

5.1 UC Commitments from PKE with Oblivious Ciphertext 
Generation 

In this section we demonstrate the feasibility of adaptively secure UC commit- 
ments for the message space m G {0, 1} from any public-key encryption scheme 

77 = (Gen, Enc, Dec, Enc, Enc ) with oblivious ciphertext generation (cf. Defi- 
nition 21) in the common reference string (CRS) model. In this model [7] the 
parties have access to a CRS chosen from a specified trusted distribution V. 
This is captured via the ideal functionality F^ ns (see [30] for the definition). 
We note that we use 77 in two places in our protocol. First, in the encoding 
phase (where the commitments are computed by the sender) and then in the 
coin-tossing phase (where the commitments are computed by the receiver). Our 
complete construction can be found in Fig. 4. Next, we prove 

Theorem 55. Assume that 77 = (Gen, Enc, Dec, Enc, Enc ) is a PKE with 
oblivious ciphertext generation. Then protocol tt com (cf. Fig. 4) UC realizes F C om 
in the CRS model in the presence of adaptive malicious adversaries. 

A High Level Proof. Intuitively, security requires proving both hiding and 
binding in the presence of static and adaptive corruptions. The hiding property 
follows from the IND-CPA security of the encryption scheme combined with the 
fact that the receiver only sees n shares in a n-out-of-3n + 1 secret-sharing of the 
message in the commit phase. On the other hand, proving binding is much more 
challenging and reduces to the facts that a corrupted sender cannot successfully 
predict exactly the n indices from {1, . . . ,3n + 1} that will be chosen in the 
coin-tossing protocol. In fact, if it can identify these n indices, then it would be 
possible for the adversary to break binding. An important information-theoretic 
argument that we prove here is that for a fixed encoding phase, no adversary 
can equivocate on two continuations from the encoding phase with different 
outcomes of the coin-tossing phase. Saying differently, for any given encoding 
phase there is exactly one outcome for the coin-tossing phase that will allow 
equivocation. Given this claim, binding now follows from the IND-CPA security 
of the encryption scheme used in the coin-tossing phase. In addition, recall that 
in the UC setting the scheme must also support a simulation that allows straight- 
line extraction and equivocation. At a high-level, the simulator sets the CRS to 
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Protocol 7TCOM* 

CRS: Two independent keys PK, PK that are in the range of Gen(l n ). 
Sender’s Input: A message m G {0, 1} and a security parameter l n . 

[Commitment phase:] 


Encoding phase: The sender chooses a random n-degree polynomial p(-) over a field ¥[x] 
such that p(0) = m. Namely, it randomly chooses ca <— F for all i G [n] and sets ao = m, 
and defines the polynomial p(x) = ao + a\x -{-*•• + a n x n . The sender then creates a 
commitment to mn as follows. For every i = [3 n + 1], it first pick bi <— {0, 1} at random 
and then computes the following pairs: 


If bi == 0 then 


= Encp k(p(i);U) 
= r % 


else, if bi = 1 then 


c] = Encp K(p(i)]U) 


where ti <— {0, l} n and n <— Enc(-) is obliviously sampled. The sender sends 
(c8,c&),...,(cg n+i? c 3n+i) 1° the receiver. 

Coin-tossing phase: The sender and receiver interact in a coin-tossing protocol that is car- 
ried out as follows. 

1. The receiver sends c = EnCp^(<Jo; r ao ) to the sender where ao {0, 1}^ is chosen 
uniformly at random. 

2. The sender picks a\ <— {0, 1}^ at random and sends it in the clear to the receiver 

3. The receiver decrypts c by revealing ao and r a 0 . 

Both the sender and the receiver compute a = ao ® a\ and use a as the random string to 
sample a random subset S C [3 n + 1] of size n. (Note that such sampling can be done in 
a simple way by partitioning the set of coordinates into n sets of triples (where the last set 
includes 4 elements) and picking one element per set. Notably, this technique does not imply 
that any potential subset of size n will be picked, rather it ensures that a subset is picked with 
a negligible probability in n, specifically (l/3) n , which suffices for our proof.) 

Cut-and-choose phase: The sender decrypts the set by sending the sequence 

{bi,p(i), ti}i e s- The receiver verifies that all the decryptions are correct and aborts oth- 
erwise. 


[Decommitment phase:] Let T = [3 n + 1] — S. The sender reveals its input m and decrypts 
all the ciphertexts in The receiver checks if all the decryptions are correct and aborts 

otherwise. Using the n polynomial evaluations revealed relative to i G S and any additional 
polynomial evaluation that was revealed relative to T, the receiver reconstructs the polynomial 
p(-) (via polynomial interpolation of n+ 1 points). Next, the receiver verifies whether p(0) = m, 
and that for every i G [3 n + 1] the point p(i) is the decrypted value within . 


Fig. 4. UC adaptively secure commitment scheme. 


public- keys for which it knows the corresponding secret-keys. This will allow 
the simulator to extract all the values encrypted by the adversary. We observe 
that the simulator can fix the outcome of the coin-tossing phase to any n-indices 
of its choice by extracting the random string ao encrypted by the receiver and 
choosing a random string a\ so that ao ® or is a particular string. Next, the 
simulator generates secret-sharing for both 0 and 1 so that they overlap in the 
particular n shares. To commit, the simulator encrypts the n common shares 
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within the n indices to be revealed (which it knows in advance), and for the rest 
of the indices it encrypts two shares, one that corresponds to the sharing of 0 and 
the other that corresponds to the sharing of 1. Finally, in the decommit phase, 
the simulator reveals that shares that correspond to the real message m, and 
exploits the invertible sampling algorithm to prove that the other ciphertexts 
were obliviously generated. 
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Abstract. The covert security model (Aumann and Lindell, TCC 2007) 
offers an important security /efficiency trade-off: a covert player may arbi- 
trarily cheat, but is caught with a certain fixed probability. This permits 
more efficient protocols than the malicious setting while still giving mean- 
ingful security guarantees. However, one drawback is that cheating can- 
not be proven to a third party, which prevents the use of covert protocols 
in many practical settings. Recently, Asharov and Orlandi (ASIACRYPT 
2012) enhanced the covert model by allowing the honest player to gen- 
erate a proof of cheating , checkable by any third party. Their model, 
which we call the PVC ( publicly verifiable covert) model, offers a very 
compelling trade-off. 

Asharov and Orlandi (AO) propose a practical protocol in the PVC 
model, which, however, relies on a specific expensive oblivious transfer 
(OT) protocol incompatible with OT extension. In this work, we improve 
the performance of the PVC model by constructing a PVC-compatible 
OT extension as well as making several practical improvements to the 
AO protocol. As compared to the state-of-the-art OT extension-based 
two-party covert protocol, our PVC protocol adds relatively little: four 
signatures and an « 67% wider OT extension matrix. This is a signifi- 
cant improvement over the AO protocol, which requires public-key-based 
OTs per input bit. We present detailed estimates showing (up to orders 
of magnitude) concrete performance improvements over the AO protocol 
and a recent malicious protocol. 


Keywords: Secure computation • Publicly verifiable covert security 


1 Introduction 

Two-party secure computation addresses the problem where two parties need to 
evaluate a common function / on their inputs while keeping the inputs private. 
Several security models for secure computation have been proposed. The most 

A.J. Malozemoff — Work partially done while the author was at Bell Labs. 


© International Association for Cryptologic Research 2015 

T. Iwata and J.H. Cheon (Eds.): ASIACRYPT 2015, Part II, LNCS 9453, pp. 210-235, 2015. 
DOI: 10.1007/978-3-662-48800-3-9 



Public Verifiability in the Covert Model (Almost) for Free 211 


basic is the semi-honest model, where the parties are expected to follow the pro- 
tocol description but must not be able to learn anything about the other party’s 
input from the protocol transcript. A much stronger guarantee is provided by 
the malicious model, where parties may deviate arbitrarily from the protocol 
description. This additional security comes at a cost. Recent garbled circuit- 
based protocols [3, IT] have an overhead of at least 40 x that of their semi-honest 
counterparts, and are considerably more complex. 

Aumann and Lindell [8] introduced a very practical compromise between 
these two models, that of covert security. In the covert security model, a party 
can deviate arbitrarily from the protocol description but is caught with a fixed 
probability e, called the deterrence factor. In many practical scenarios, this guar- 
anteed risk of being caught (likely resulting in loss of business and/or embar- 
rassment) is sufficient to deter would-be cheaters. Importantly, covert protocols 
are much more efficient and simpler than their malicious counterparts. 

Motivating the Publicly Verifiable Covert (PVC) Model. At the same 
time, the cheating deterrent introduced by the covert model is relatively weak. 
Indeed, a party catching a cheater certainly knows what happened and can 
respond accordingly, e.g., by taking their business elsewhere. However, the impact 
is largely limited to this, since the honest player cannot credibly accuse the 
cheater publicly. If, however, credible public accusation were possible, the deter- 
rent for the cheater would be immeasurably greater: suddenly, all the cheater’s 
customers would be aware of the cheating and thus any cheating may affect the 
cheater’s global customer base. 

The addition of credible accusation greatly improves the covert model even in 
scenarios with a small number of players, such as those involving the government. 
Consider, for example, the setting where two agencies are engaged in secure 
computation on their respective classified data. The covert model may often be 
insufficient here. Indeed, consider the case where one of the two players deviates 
from the protocol, perhaps due to an insider attack. The honest player detects 
this, but we are now faced with the problem of identifying the culprit across two 
domains, where the communication is greatly restricted due to trust, policy, data 
privacy legislation, or all of the above. On the other hand, credible accusation 
immediately provides the ability to exclude the honest player from the suspect 
list, and focus on tracking the problem within one organization/trust domain , 
which is dramatically simpler. 

PVC Definition and Protocol. Asharov and Orlandi [7] proposed a security 
model, covert with public verifiability , and an associated protocol, addressing 
these concerns. At a high level, they proposed that when cheating is detected, the 
honest player is able to publish a “certificate of cheating” which can be checked 
by any third party. In this work, we abbreviate their model as PVC: publicly 
verifiable covert. Their proposed protocol (which we call the “AO protocol” ) has 
performance similar to the original covert protocol of Aumann and Lindell [8], 
with the exception of requiring signed- OT, a special form of oblivious transfer 
(OT). Their signed-OT construction is based on the OT of Peikert et al. [18], 
and thus requires several expensive public-key operations. 
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In this work, we propose several critical performance improvements to the 
AO protocol. Our most technically involved contribution is a novel signed-OT 
extension protocol which eliminates per-instance public-key operations. Before 
discussing our contributions and technical approach in Sect. 1.1, we review the 
AO protocol. 

The Asharov-Orlandi (AO) PVC Protocol [ 7 ]. The AO protocol is based 
on the covert construction of Aumann and Lindell [8]. Let Pi be the circuit 
generator, P 2 be the evaluator, and /(•,•) be the function to be computed. Recall 
the standard garbled circuit (GC) construction in the semi-honest model: Pi 
constructs a garbling of / and sends it to P2 along with the wire labels associated 
with its input. The parties then run OT, with Pi acting as the sender and 
inputting the wire labels associated with P^s input, and P2 acting as the receiver 
and inputting as its choice bits the associated bits of its input. 

We now adapt this protocol to the PVC setting. Recall the “selective failure” 
attack on P2’s input wires, where Pi can send P2 via OT an invalid wire label 
for one P2’s two inputs and learn one of P2’s input bits based on whether P2 
aborts. To protect against this attack, the parties construct /'(xi, Xg, . . . , x£) = 
/(x 1, x l), where v is the XOR-tree replication factor , and compute /' 

instead of /. Party Pi then constructs A (the GC replication factor ) garblings of 
f and P2 checks that A — 1 of the GCs are correctly constructed, evaluating the 
remaining GC to derive the output. The main difficulty of satisfying the PVC 
model is ensuring that neither party can improve its odds by aborting (e.g., 
based on the other party’s challenge). For example, if Pi could abort whenever 
P2 s challenge would reveal Pi’s cheating, this would enable Pi to cheat without 
the risk of generating a proof of cheating. Thus, Pi sends the GCs to P2 through 
a 1 -out-of-A OT; namely, in the zth input to the OT Pi provides openings for 
all the GCs but the ith, as well as the input wire labels needed to evaluate GC*. 
Party P2 inputs a random 7, checks that all GCs besides GC 7 are constructed 
correctly, and if so, evaluates GC 7 . 

Finally, it is necessary for Pi to operate in a verifiable manner, so that 
an honest P2 has proof if Pi tries to cheat and gets caught. (Note that GCs 
guarantee that P2 cannot cheat in the GC evaluation at all, so we only worry 
about catching Pi.) The AO protocol addresses this by having Pi sign all its 
messages and the parties using signed- OT in place of all standard OTs (including 
wire label transfers and GC openings). Informally, the signed-OT functionality 
proceeds as follows: rather than the receiver R getting message 015 from the 
sender S for choice bit 6, R receives ((6, m^a), where a is P’s signature of 
(6, rrift). This guarantees that if R detects any cheating by P, it has P’s signature 
on an inconsistent set of messages, which can be used as proof of this cheating. 
Asharov and Orlandi show that this construction is e-P VC-secure for e = (1 — 
1/A) (1 — 2 ~ v+1 ). 


1.1 Our Contribution 

Our main contribution is a signed-OT extension protocol built on the recent 
malicious OT extension of Asharov et al. [6]. Informally, signed-OT extension 
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ensures that (1) a cheating sender S is held accountable in the form of a “cer- 
tificate of cheating” that the honest receiver R can generate, and (2) a malicious 
R cannot defame an honest S by presenting a false “certificate of cheating”. 
Achieving the first goal is fairly straightforward by having S simply sign all its 
messages. The challenge is in simultaneously protecting against a malicious R. 
In particular, we need to commit R to its particular choices throughout the OT 
extension protocol to prevent it from defaming an honest S', while maintaining 
that those commitments do not leak any information about FF s choices. 

Recall that in the standard OT extension protocol of Ishai et al. [12] (cf. 
Fig. 3), R constructs a random matrix M, and S obtains a matrix M' derived 
from M, S’s random string s and FFs vector of OT inputs r. The key challenge 
of adapting this protocol to the signed variant is to efficiently prevent R from 
submitting a malleated M as part of the proof without it ever explicitly revealing 
M to S (as this would leak FFs choice bits). We achieve this by observing that 
S does in fact learn some of M, as in the OT extension construction some of 
the columns of M and M' are the same (i.e., those corresponding to zero bits of 
S ’ s string s). We prevent R from cheating by having S include in its signature 
carefully selected information from the columns in M which S sees. Finally, we 
require that R generates each row of M from a seed, and that FFs proof of 
cheating includes this seed such that the row rebuilt from the seed is consistent 
with the columns included in S’s signature. We show that this makes it infeasible 
for R to successfully present an invalid row in the proof of cheating. We describe 
this approach in greater detail in Sect. 3 1 . 

As another contribution, we present a new more communication efficient PVC 
protocol, building off of the AO protocol; see Sect. 4. Our main (simple) trick 
there is a careful amendment allowing us to send GC hashes instead of GCs; this 
is based on an idea from Goyal et al. [11]. 

We work in the random oracle model, a slight strengthening of the assump- 
tions needed for standard OT extension and free-XOR, two standard secure 
computation tools. 

Comparison with Existing Approaches. The cost of our protocol is almost 
the same as that of the covert protocol of Goyal et al. [11]; the only extra cost 
is essentially a « 67% wider OT extension matrix and four signatures. This 
often negligible additional overhead (versus covert protocols) provides us with 
dramatically stronger (than covert) deterrent. We believe that our PVC protocol 
could be used in many applications where covert security is insufficient at the 
order-of-magnitude cost advantage over previously-needed malicious protocols 
or the PVC protocol of Asharov and Orlandi [7]. See Sect. 5 for more details. 

Related Work. The only directly related work is that of Asharov and Orlandi [7] , 
already discussed at length. We also note a recent line of work on secure 

1 Our construction is also interesting from a theoretical perspective in that we con- 
struct signed-OT from any maliciously secure OT protocol, whereas Asharov and 
Orlandi [7] build a specific construction based on the Decisional Diffie- Heilman 
problem. 
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computation with cheaters (including fairness violators) punished by an external 
entity, such as the Bitcoin network [4, 10, 16]. Similarly to the PVC model and our 
protocols, this line of work relies on generating proofs of misbehavior which could 
be accepted by a third-party authority. However, these works address a different 
setting and use different techniques; in particular, they build on maliciously-secure 
computation and require the Bitcoin framework. 

2 Preliminaries 

Let k denote the (computational) security parameter, let p denote the statistical 
security parameter, and let r denote the field size. When considering concrete 
costs, we utilize the security parameter and field size settings for key lengths 
recommended by NIST [9]; see Fig. 1. We use ppt to denote “probabilistic poly- 
nomial time” and let negl(-) denote a negligible function in its input. We con- 
sider two-party protocols between parties Pi and P 2 , and when we use subscript 
i £ {1,2} to denote a party we let subscript -i = 3 — i denote the other party. 
We use i* £ {1,2} to denote a malicious party and -i* m 3 — i* to denote the 
associated honest party. 


Security 

K 

FCC ECC 

Short 

80 

1024 

160 

Long 

128 

3072 

256 


Fig. 1. Settings for (computational) security parameter k and field size r for various 
security settings as recommended by NIST [9]. FCC denotes the setting of r when 
using finite field cryptography and ECC denotes the setting of r when using elliptic 
curve cryptography. 

We use bold lowercase letters (e.g., x) to denote bitstrings and use the nota- 
tion x[i] to denote the ith bit in bitstring x. Likewise, we use bold uppercase 
letters (e.g., T) to denote matrices over bits. We use [n\ to denote {1, . . . , n}. 
Let “a /(aq, # 2 , • • • )” denote setting a to be the deterministic output of / 
on inputs aq, X 2 , • • • ; the notation “a f{pc 1 ,^ 2 , . . . )” is the same except that 
/ here is randomized. We abuse notation and let a S denote selecting a uni- 
formly at random from set S. 

Our constructions are in the Ppki model, where each party Pi can register a 
verification key, and other parties can retrieve Pi s verification key by querying 
Ppk\ on id P We use the notation Sign P . (•) to denote a signature signed by Pi s 
secret key, and we assume that this signature can be verified by any third party. 
We often leave off the subscript if the identity of the signing party is clear. 

2.1 Publicly Verifiable Covert Security 

We assume the reader is familiar with the covert security model; however, 
we review the less familiar publicly verifiable covert (PVC) security model of 
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Asharov and Orlandi [7] below. When we say a protocol is “secure in the covert 
model,” we assume it is secure under the strong explicit cheat formulation with 
e-deterrent [8, §3.4], for some value of e. 

Let 7T be a two-party protocol between parties P\ and P2 implementing func- 
tion /. Following Aumann and Lindell [8], we call 7 r non-halting if for honest Pi 
and fail-stop adversary 2 the probability that Pi outputs corrupted^ is negli- 
gible. Consider the triple of algorithms ( 7 r', Blame, Judgment) defined as follows: 

- Protocol 7 r' is the same as 7 r except that if an honest party P_i * outputs 
corrupted^* when executing 7 r, it computes Cert Blame(idp, key, View.p), 
where key denotes the type of cheating detected, and sends Cert to Pi* . 

- Algorithm Blame is a deterministic algorithm which takes as input a cheating 
identity id, a cheating type key, and a view View of a protocol execution, and 
outputs a certificate Cert. 

- Algorithm Judgment is a deterministic algorithm which takes as input a cer- 
tificate Cert and outputs either an identity id or A. 

Before proceeding to the definition, we first introduce some notation. Let 
Exec 7r?v 4 (^)(xi, £ 2 ; 1 K ) denote the transcript (i.e., messages and output) produced 
by Pi with input x\ and P2 with input X2 running protocol 7 r, where adver- 
sary A with auxiliary input z can corrupt parties before execution begins. Let 
Outputp. (Exec^ ,a(z){ x u x 2] 1*)) denote the output of Pi on the input transcript. 

Definition 1 . We say that (77', Blame, Judgment) securely computes f in the 
presence of a publicly verifiable covert adversary with e-deterrent (or, is e-PVC- 
secure) if the following conditions hold: 

1 . The protocol it' is a non-halting and secure realization of f in the covert model 
with e- deterrent. 

2 . (Accountability) For every ppt adversary A corrupting party Pi* , there exists 
a negligible function negl(-) such that if Outputp (Exec^^^^i, X2] 1 K )) = 
corrupted^* then Pr [Judgment(Cert) = i dH> 1 — negl(ft), where Cert Blame 
(idp, key, View _p) and the probability is over the randomness used in the pro- 
tocol execution. 

3 . (Defamation- free) For every ppt adversary A corrupting party Pi* and out- 
putting a certificate Cert, there exists a negligible function negl(-) such that 
Pr [Judgment(Cert) = id -i* ] A negl(ft), where the probability is over the ran- 
domness used by A. 

Note that, in particular, the PVC definition implicitly disallows Blame to reveal 
P.i*’ s input. This is because 7 r' specifies that Cert is sent to Pi*. 

2.2 Signed Oblivious Transfer 

A central functionality for constructing PVC protocols is signed oblivious transfer 
(signed-OT). Introduced by Asharov and Orlandi [7], we can define the basic 

A fail- stop adversary is one which acts semi-honestly but may halt at any time. 
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signed- OT functionality T as 

(-L, (m 6 ,Sign sk (6,TO b ))) mi, sk), (b, vk)), 

where the signature scheme is assumed to be existentially unforgeable under 
adaptive chosen message attack (EU-CMA). Namely, the sender S inputs two 
messages mo and mi along with a signing key sk; the receiver R inputs a choice 
bit b and a verification key vk; S receives no output whereas R receives 
alongside a signature on (6, ra&). 

However, as in prior work [7], this definition is too strong for our signed- 
OT extension construction to satisfy. We introduce a relaxed signed- OT variant 
(slightly different from Asharov and Orlandi’s variant [7]) which is tailored for 
OT extension and is sufficient for obtaining PVC-security. Essentially, we need 
a signature scheme that satisfies a weaker notion than EU-CMA in which the 
signing algorithm takes randomness, a portion of which can be controlled by the 
adversary 3 . This is because in our signed-OT extension construction, a malicious 
party can influence the randomness used in the signing algorithm. In addition, 
we introduce an associated data parameter to the signing algorithm which allows 
the signer to specify some additional information unrelated to the message being 
signed but used in the signature. In our construction, we use the associated data 
to tie the signature to a specific counter (such as a session ID or message ID), 
preventing a malicious receiver from “mixing” properly signed values to defame 
an honest sender. 

Let II = (Gen, Sign, Verify) be a tuple of ppt algorithms over message space 
jM, associated data space £>, and randomness spaces 77, i and 77,2, defined as 
follows: 

1. Gen(l^): On input security parameter 1 K , output key pair (vk, sk). 

2. Sign sk (m, a; (t*i, 7*2)): On input secret key sk, message m G M, associated data 
a G V, and randomness r± G 77 , 1 and 7*2 G 77,2, output signature a = (a, cr'). 

3. Verify vk (m, a): On input verification key vk, message m G Ad, and signature 
<7, output 1 if a is a valid signature for m and 0 otherwise. 

For security, we need the condition that unforgeability remains even if the adver- 
sary inputs some arbitrary n or 7 * 2 . However, the adversary is prevented from 
inputting values for both r\ and 7 * 2 . This reflects the fact that in our signed-OT 
extension construction, a malicious sender can control only r\ and a malicious 
receiver can control only 7 * 2 . We place a further restriction that the choice of T\ 
must be consistent ; namely, all queries to Sign must use the same value for r\. 
Looking ahead, this property exactly captures the condition we need ( 7*1 cor- 
responds to the zero bits in the sender’s column selection string in the OT 

3 Our notion is similar to the p-EU-CMRA notion introduced by Asharov and 
Orlandi [7]. It differs in that we allow different portions of the randomness to be 
corrupted, but not both portions at once. Looking forward, this is needed because 
the sender in our signed-OT functionality is only allowed to control some of the 
randomness. 
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extension), where the choice of r\ is made once and then fixed throughout the 
protocol execution. 

Towards our definition, we define an oracle O s k(-, •, •, •) as follows. Let _L be 
a special symbol. On input (ra, a, r*i, 7*2), proceed as follows. If neither 7*1 nor 7*2 
equal JL f output _L. Otherwise, proceed as follows. If r\ = _L and r[ has not been 
set, set r[ uniformly at random; if n 7^ _L and r[ has not been set, set r[ = 7q; 
if 7*2 = -L, set r 2 uniformly at random; otherwise, set r ' 2 = r 2. Finally, output 
Sign sk (m,a; (r' l} r' 2 )). 

Now, consider the following game S ig-forge^ M y y R A ( k ) for signature scheme 77 
between ppt adversary A and ppt challenger C. 

1. C runs (vk, sk) Gen(l^) and sends vk to A. 

2 . A, who has oracle access to O s k(-, *, •, •), outputs a tuple (777, (a, a')). Let Q 
be the set of messages and associated data pairs input to O s k(-, 

3 . A succeeds if and only if ( 1 ) Verify vk (777, (a, a')) = 1 and ( 2 ) (777, a) 0 Q. 

Definition 2. Signature scheme 77 = (Gen, Sign, Verify) is existentially unforge- 
able under adaptive chosen message and partial randomness attack (EU-CMPRA) 

if for all ppt adversaries A there exists a negligible function negl(-) such that 
Pr[Sig-forge^ M £ RA (/t)] < negl(rc). 


Functionality JffgnedOT 

The functionality is parameterized by an EU-CMPRA signature scheme 77 = 

(Gen, Sign, Verify). 

Input: The sender inputs messages mo and mi such that |mo| =? |mi|, secret 
key sk, associated data a, randomness rj, and signatures Oq and a J*. The receiver 
inputs choice bit 6, verification key vk, and randomness rj. If the sender (resp., 
the receiver) is honest, then rl = <Jq = a* = _L (resp., r 2 = _L). 

Output: The functionality computes crb = Sign sk ((6, mb), a; (r * , r 2 )) for b E 
{0,1}. The sender receives no output. The receiver receives the following out- 
put based on if the sender is corrupt or not: 

— If either erg ^ T or cq ^ _L, the functionality outputs ((6, m&), cr^) if and only 
if Verify vk ((0, mo), (To) = Verify vk ((l, mi), a{) = 1, where at <- a b if at = -L; 
otherwise it outputs abort. 

— If <Jq = <J\ — T, the functionality outputs ((6, mb), <Jb)- 


Fig. 2. Signed oblivious transfer functionality. 


Signed-OT Functionality. We are now ready to introduce our relaxed signed- 
OT functionality. As is our EU-CMPRA signature, it is tailored for OT exten- 
sion, and is sufficient for building PVC protocols. This functionality, denoted 
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by fignedOT ’ * s parameterized by an EU-CMPRA signature scheme 77 and is 
defined in Fig. 2. As in standard OT, the sender inputs two messages (of equal 
length) and the receiver inputs a choice bit. However, in this formulation we allow 
a malicious sender to specify some random value r* as well as signatures ctq and 
a i . Likewise, a malicious receiver can specify some random value r\. (Honest 
players input A for these values.) If both players are honest, the functionality 
computes a <— Sign((b, ra&); (ri, 7*2)) with uniformly random values r\ and 
and outputs ((6, m&), a) to the receiver. However, if either party is malicious and 
specifies some random value, this is fed into the Sign algorithm. Likewise, if the 
sender is malicious and specifies some signature 7^ A, this value is used as 
the signature sent to the receiver. 

Note that ^iignedOT neai *ly identical to the signed- OT functionality pre- 
sented by Asharov and Orlandi [ 7 , Functionality 2 ]; it differs in the use of EU- 
CMPRA signature schemes instead of p-EU-CMRA schemes. We also note that it 
is straightforward to adapt d~^ snedOT realize OTs with more than two inputs 

from the sender. We let (i)-J~£( snedOT denote a 1 -out-of-A variant of ^iignedOT- 

A Compatible Commitment Scheme. Our construction of an EU-CMPRA 

signature scheme (cf. Sect. 3 . 3 ) uses a non-interactive commitment scheme, which 
we define here. Our definition follows the standard commitment definition, except 
we tweak the Com algorithm to take an additional associated data value. 

Let TIcom = (ComGen, Com) be a tuple of ppt algorithms over message space 
A4 and associated data space V, defined as follows: 

1. ComGen(l^): On input security parameter 1 K , compute parameters params. 

2 . Com (m,a;r): On input message m E A4, associated data a E V, and ran- 
domness r, output commitment com. 

A commitment can be opened by revealing the randomness r used to construct 
that commitment. 

We now define security for our commitment scheme. We only consider the 
binding property; namely, the inability for a PPT adversary to open a commit- 
ment to some other value than that committed to. Security is the same as for 
standard commitment schemes, except we allow the adversary to control the 
randomness used in ComGen. 

Consider the game Com-bind^ 7j Com (^) for commitment scheme ilcom between 
a ppt adversary A and a ppt challenger C, defined as follows. 

1. A sends randomness r to C. 

2. C computes params ComGen(l K ;r) and sends params to A. 

3. A outputs (com, mi, ai, ri, m2, a2, ^2) and wins if and only if (1) mi 7^ m2, 
and ( 2 ) com = Com(params, mi, cq; rq) = Com(params, m2, <22; 7*2). 

Definition 3. A commitment scheme II c 0 m = (ComGen, Com) is (computation- 
ally) binding if for all ppt adversaries A , there exists a negligible function negl(-) 
such that Pr[Com-bind^ ? / 7 Com (fo)] < negl(ft). 
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3 Signed Oblivious Transfer Extension 

We now present our main contribution: an efficient instantiation of signed obliv- 
ious transfer (signed-OT) extension. We begin in Sect. 3.1 by describing in detail 
the logic of the construction, iteratively building it up from the passively secure 
protocol of Ishai et al. [12]. We motivate the need for EU-CMPRA signature 
schemes in Sect. 3.2 and present a compatible such scheme in Sect. 3.3. In Sect. 3.4 
we present the proof of security. 


3.1 Intuition for the Construction 

Consider the OT extension protocol of Ishai et al. [12] in Fig. 3, run between 
sender S and receiver R. This protocol is secure against a semi-honest R and 
malicious S. We show how to convert this protocol into one which satisfies the 
^signedOT functionality defined in Fig. 2. For clarity of presentation, we build 
on the protocol of Fig. 3 and later discuss how to support a malicious R as well, 
based on the malicious OT extension protocol of Asharov et al. [6]. 


S'’ s inputs: Message pairs {(x°, x])}^ m ], where each x°,xj G {0, l} n . 

IPs inputs: Selection bits r = (n, . . . , r m ). 

Common inputs: Security parameter ac; number of base OTs £ (= k)\ hash 
function H : N x {0, 1} £ — ► {0, l} n ; ideal functionality Rot- 

1. Initial OT Phase: 

— S computes s $ {0, 1 } £ . 

— R generates a random m x £ matrix T, where the j th row is t j and the 
2 th column is P. Likewise, R generates a random m x £ matrix V, where 
the j th row is Wj and the 2 th column is v\ 

— S and R run Rot £ times in parallel, where S acts as the receiver with 
input Si in the 2 th OT and R acts as the sender with input (P,v z ) in 
the 2 th OT. 

2. OT Extension Phase (Part I): 

— For 2 G [ra], R sets u l <— P ® v* ® r, and sends u 2 to S. 

3. OT Extension Phase (Part II): 

— Let Q be the m x £ matrix where each column q l = (si • (u* ® v z )) ® 
((1 — Si) • P). Note that q 4 = (si • r) ® P and qj = (rj ■ s) ® tj. 

— For j G [m], S computes y ° <— x° ® H(j , q^) and y) x} ® H(j , q j ® s), 
and sends and yj to R. 

— For j G [m], R computes Xj y r - 3 ® H(j,tj). 

4. Output: 

— S outputs _L and R outputs {xj}^ m j. 


Fig. 3. Protocol implementing passively secure OT extension [5,12]. 
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As a first attempt, suppose S simply signs all its messages in Step 3. Recall 
that we will use this construction to have P\ send the appropriate input wire 
labels to P2 ; namely, Pi acts as S in the OT extension and inputs the wire labels 
for P2 s input wires whereas P2 acts as R and inputs its input bits. Thus, our 
first step is to enhance the protocol in Fig. 3 to have S send a' Sign ((j, y®)) 
and a" < — $ Sign((j, yj)) in Step 3. 

Now, if P2 gets an invalid (with respect to a signed GC sent in the PVC proto- 
col of Sect. 4) wire label Xj, it can easily construct a certificate Cert which demon- 
strates Pi’s cheating. Namely, it outputs as its certificate the tuple (&, j, y°,yj, 
<j / , cr // , tj) along with the (signed by Pi and opened) GC containing the invalid 
wire label. A third party can (1) check that a' and a" are valid signatures and 
(2) compute H (j, t j) 0y h - and check that xj is indeed an invalid wire label 

for the given garbled circuit. 

This works for protecting against a malicious Pi; however, note that P2 can 
easily defame an honest P\ by outputting t* 7^ t j as part of its certificate (in 
which case x^ i7(j, t*) 0 y^ will very likely be an invalid wire label). Thus, 
the main difficulty in constructing signed- OT extension is tying P2 to its choice 
of the matrix T generated in Step 1 of the protocol so it cannot blame an honest 
Pi by using invalid rows t* in its certificate. 

Towards this end, consider the following modification. In Step 1, R now 
additionally sends commitments to each t j to 5, and S signs these and sends 
them as part of its messages in Step 3. This prevents R from later changing t j 
to blame S. This does not quite work, however, as R could simply commit to an 
incorrect t* in the first place! Clearly, R cannot send T to S', as this would leak 
P’s selection bits, yet we still need P to somehow be committed to its choice of 
the matrix T. The key insight is noting that S does in fact know some of the 
bits of T; namely, it knows those columns at which Si = 0 (as it learns V in the 
base OT). We can use this information to tie P to its choice of T such that it 
cannot later construct some matrix T* 7^ T to defame S. 

We do this by enhancing Step 3 as follows. Let 7° be the set of indices i such 
that Si = 0 (recall that s is the random selection bits of S input to the base OTs 
in Step 1). Let tj^ denote the ith bit in row t j. Note that S knows the values 
of tj ;L for i £ 7°, and could thus compute as a “binding” of P’s 

choice of t j. By including this information in its signature, S enforces that any 
t * that P tries to use to blame S must match in the given positions. This brings 
us closer to our goal; however, there are still two issues that we need to resolve: 

1. Sending {(i, to P leaks s, which allows P to learn both of P’s inputs. 

We address this by increasing the number of base OTs in Step 1 and having 
S only send some subset 7 C 7° such that |7| = k. Thus, while P learns that 
Si = 0 for i £ 7, by increasing the number of base OTs enough, P does not 
have enough information to recover s. 

2. P can still flip one bit in t j and pass the check with high probability. We 
fix this by having each t j be generated by a seed ky. Namely, P computes 


Public Verifiability in the Covert Model (Almost) for Free 221 


t j <— G(kj) in Step 1, where G is a random oracle 4 . Then, when blaming S, R 
must reveal k j instead of t j. Thus, with high probability a malicious poly time 
R cannot find some k* ^ k j such that the Hamming distance between G( k*) 
and G(kj) is small enough that the above check succeeds. 

Finally, note that we have thus far considered the passively secure OT exten- 
sion protocol, which is insecure against a malicious R. We thus utilize the mali- 
ciously secure OT extension protocol of Asharov et al. [6]. The only way R can 
cheat in passively secure OT extension is by using different r values in Step 2. 
Asharov et al. add a “consistency check” phase between Steps 1 and 2 to enforce 
that r is consistent. This does not affect our construction, and thus we can 
include this step to complete the protocol 5 . We refer the reader to Asharov 
et al. [6] for the justification and intuition of this step; as far as this work is 
concerned we can treat this consistency check as a “black box” . 

Observation 1 (OT Extension Matrix Size). We set £, the number of base 
OTs , so that leaking ft bits to R does not allow it to recover s and thus both 
messages. We do this as follows. Let i' be the number of base OTs required in 
malicious OT extension [6]. We set £ = (! -\-k and require that when S chooses s, 
it first fixes ft randomly selected bits to zero before randomly setting the rest of 
the bits. Now, when S reveals I to R, the number of unknown bits in s is equal to 
G and thus the security of the Asharov et al. scheme carries over to our setting. 
Asharov et al. set (! ~ 1.6ft, and thus us using ft extra columns results in an 
« 67 % matrix size increase. 

Observation 2 (Batching Signatures). The main computational cost of our 
protocol is the signatures sent by S in Step f. This cost can easily be brought to 
negligible, as follows. Recall that when using our protocol for transferring the 
input wire labels of a GC using free-XOR we can optimize the communication 
slightly by setting x° <— H(j , q^) and yj <— xf- ® A ® H(j , ® s), where A is 

the free-XOR global offset. Thus, S only needs to send (and sign) yj . 

The most important idea, however, is to batch messages across OT executions 
and have S sign (and send) only one signature which includes all the necessary 
information across many OTs. Namely, using the free-XOR optimization above, 
S signs and sends the tuple (/, {yj, {tj^i}iei}je[m\) t° R- We note that the j 
values need not be sent as they are implied by the protocol execution. 

Figure 4 gives the full protocol for signed-OT extension. For clarity of presen- 
tation, this description, and the following proof of security, does not take into 
account the optimizations described in Observation 2. 

4 Note that G cannot be a pseudorandom generator because the input to G is not 
necessarily uniform as the inputs may be adversarially chosen by R. 

5 The reason this does not affect our construction is because the consistency check 
phase only involves R sending messages to S'. A malicious R cannot defame S because 
we are only enforcing that R ’ s value r is consistent. 
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S’s inputs: Messages {(x°, x))}^^ where x°,x) G {0, l} n ; signing key sk. 

TPs inputs: Selection bits r = (n, . . . , r m ); verification key vk. 

Common inputs: Security parameter k; statistical security parameter p; number 
of base OTs i\ number of check functions /i; random oracle G : {0, 1} K — ) > {0, 1} £ ; 
random oracle 77 : N x {0, 1} £ — >> {0, l} n ; random oracle 77' : {0, l} m — >> {0, 1} K ; 
EU-CMA signature scheme 77 = (KeyGen', Sign', Verify'); ideal functionality T ot- 

1. Initial OT Phase: 

— S' computes s G {0, 1} £ as follows. Let 7 be a set of indices, where |7| = k. 
For i G 7, S sets Si = 0. Then, S fills the remaining bits at random. 

— For j G [m\, R computes k j <— $ {0, 1} K and sets tj G(kj). 

— Let T be an m x i matrix, where the j th row is tj and the zth column 

is t\ Let V be an m x £ matrix, where the j th row is Vj and the zth 

column is vb S and R run Tot £ times in parallel, where S acts as the 
receiver with input Si and R acts as the sender with input (t l ,v l ). 

2. OT Extension Phase (Part I): 

— For i G [£], R sets u 1 «— t l ® v* ® r, and sends iT to S. 

3. Consistency check of r: 

— Same as in maliciously-secure OT extension protocol of Asharov et al. [6]. 

4. OT Extension Phase (Part II): 

— Let Q be the m x £ matrix where each column q* = (s* • (u l ® v*)) ® 

((1 — • P). Note that q* = (s; • r) ® P and q j = ( rj • s) ® tj. 

— Let 7 be the set defined in Step 1, and let tjj denote the ith bit in row 
tj. S sends 7 to 77, who checks that |7| = n and otherwise aborts. 

— For j G [m], S computes y j x° ® H(j,qj) and y] x^ ® 
H(j, qj © s) and signatures a' <- Sign^ {{I , j,y° , {tj,i} ieI )) , and a” «- 
Sign' k and sends 0'.y?>yf t0 R - 

— For j G [m], R computes Xj <— y r - 3 ® 77 (j, tj). 

5. Output: 

- S outputs _L; R outputs {xj , ( j , rj , k j ,I,y°, y ) , {tj,i} i€l , a'j , a") } je[m] ■ 


Fig. 4. Signed-OT extension, based on the OT extension protocol of Asharov et al. [6]. 


3.2 Towards a Proof of Security 

Before presenting the security proof, we first motivate the need for EU-CMPRA 
signature schemes. As mentioned in Sect. 3.1, ideally we could just have S sign 
everything using an EU-CMA signature scheme; however, this presents opportu- 
nities for R to defame S. Thus, we need to enforce that R cannot output an xj 
value different from the one sent by S. We do so by using a binding commitment 
scheme 77c om = (Com Gen, Com), and show that the messages sent by S in Step 
4 are essentially binding commitments to the underlying xj values. 

We define 77c 0 m as follows, where G : {0, 1}* — > {0, 1} £ and 77 : N x {0, 1} £ — > 
{0,1}^ are random oracles, and i>K. 
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1. ComGen (1^): choose set 7 C [£] uniformly at random subject to \I\ = ft; 
output para ms <— 7. 

2. Com(params, m, j; r): On input parameters 7 <— params, message m, counter 

j, and randomness r G {0,1}*, proceed as follows. Compute t <— G( r), set 
com (j, m ® 77 (j, t), 7, {ti} iG/ ), and output com. 

We make the assumption that given 7, one can derive the randomness input 
to ComGen. (We use this when defining our EU-CMPRA signature scheme below, 
which uses a generic binding commitment scheme). We can satisfy this by simply 
letting the randomness input to ComGen be the set 7. 

In our signed-OT extension protocol, the set 7 chosen by S is used as params 
and the k j values chosen by R are used as the randomness to Com. The com- 
mitment value com is exactly the message signed and sent by S in Step 4. Thus, 
ignoring the signatures for now, we have an OT extension protocol that binds 
S to its xj values, and thus prevents a malicious R from defaming an honest S. 
Adding in the signatures (cf. Sect. 3.3) gives us an EU-CMPRA signature scheme. 
Namely, S' is tied to its messages due to the signatures and R is prevented from 
“changing” the messages to defame S due to the binding property of the com- 
mitment scheme. 

We now prove that the commitment scheme described above is binding. We 
actually prove something stronger than what is required in our protocol. Namely, 
we prove that an adversary who can control both random values still cannot 
win, whereas when we use this commitment scheme in our signed-OT extension 
protocol, only one of the two random values can be controlled by any one party. 

Theorem 1. Protocol ilcom is binding according to Definitions. 

Proof. Adversary A needs to come up with choices of 7, m, m', j, j', r, and 
r' such that (j, m © 77 (j, t), 7, {U} ieJ ) = (j 7 , m' ® 77(j', t 7 ), 7, {^} iG// ), where 
t <— G( r) and t' <— G(r'). Clearly, j = j' . Thus, A must find t and t' such 
that ti = t\ for all i G I. However, by the property that G is a random oracle, 
the values t and t' are distributed uniformly at random in {0,1}^. Thus, the 
probability that A finds two bitstrings t and t' that match in n bits is negligible, 
regardless of the choice of 1. ■ 

3.3 An EU-CMPRA Signature Scheme 

We now show that the messages sent by S' in Step 4 form an EU-CMPRA signature 
scheme. Let 77' = (Gen 7 , Sign 7 , Verify 7 ) be an EU-CMA signature scheme and 
-Z7com = (ComGen, Com) be a commitment scheme satisfying Definition 3 (e.g., 
the scheme presented in Sect. 3.2). Consider the scheme 77 = (Gen, Sign, Verify) 
defined as follows. 

1. Gen(l^): On input 1^, run (vk, sk) Gen 7 (l^) and output (vk, sk). 

2. Sign sk (m,j; (r*, r^)): On input message m G {0,1}*, counter j G N, and 
randomness r^ and r^, proceed as follows. Compute params <— Com Gen (1 K ; r*) 
and com <— Com(params, m, j; r^). Next, choose m'^${0, 1}* and compute 
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com' <— Com(params, m', j; r^) 6 . Output a <— (j, params, 1*2, com, com', Sign 7 k 
((params, com)), Sign 7 k ((params, com'))). 

3. Verify pk (m, a): On input message m and signature cr, parse a as (j, params, r, 
com', com", cr 7 , cr"), and output 1 if and only if (1) Com(params, m; r) = com', 
( 2 ) Verify( /k ((params, com'), a') = 1 , and (3) Verify( /k ((params, com"), a") = 1 ; 
otherwise output 0 . 

As explained in Sect. 3.2, this signature scheme exactly captures the behavior 
of S in our signed-OT extension protocol. We now prove that this is indeed an 
EU-CMPRA signature scheme. 

Theorem 2. Given an EU-CMA signature scheme II' = (Gen / , Sign 7 , Verify 7 ) 
and a commitment scheme ilcom = (ComGen,Com) secure according to Defini- 
tions, then II = (Gen, Sign, Verify) described above is an EU-CMPRA signature 
scheme. 

Proof. Let A be a ppt adversary attacking 77. We construct an adversary B 
attacking II'. Adversary B receives vk from the challenger and initializes A 
with vk as input. Let (m,j, r*,!^) be the input of A to its signing oracle. 
Adversary B emulates the execution of Al’s signing oracle as follows: it com- 
putes params ComGen(l K ; r*) and com Com(params, m, j; r 2 ), chooses m 7 
uniformly at random and computes com' <— Com(params, m 7 , j\ r 2 ), constructs 
a <— (j, params, r 2 , com, com', Sign 7 k ((params, com)), Sign 7 k ((params, com'))), and 
sends a to A. After each of A’s queries, B stores (m, j) in set Qq and stores all 
the messages it sent to its signing oracle in set Q&. 

Eventually, A outputs (m, (j, a')) as its forgery. Adversary B checks that 
Verify vk (m, (j,a')) = 1 and that (m ,j) 0 Q q. If not, B outputs 0. Otherwise, B 
parses a' as (params, r, com 7 , com", cr 7 , a") and checks that com 7 0 Q&. If so, it 
outputs (cornua'); otherwise it outputs 0 . 

Note that Sig-forge™^ RA (ft) = 1 and Sig-forge# 1 ^, (k) = 0 if and only if 
Verify vk (m, (j, params, r, com 7 , com", cr 7 , cr")) = 1 and (m ,j) 0 Q q but com 7 E 
Qi 3. Fix some (m, (j, params, r, comi, comi/, cy, <Jy)) such that this is the case. 
Thus it holds that comp E Q&. This implies that B queried Sign 7 on comi, 
which means that A queried its signing oracle on some (m 7 , j 7 , r*, r 2 ), where 
m 7 7 ^ m, and received back (j 7 , params, r 7 , comi, com 2 /, cry, cr 2 /). However, this 
implies that Com(params, comi; r) = m and Com(params, comi; r 7 ) = m 7 . Thus, 
Pr[Sig-forge 5 ^ RA (/c)] = Pr[Sig-forge^(«)] + Pr[Com-bind B , j/Tcom (>c)] for some 
ppt adversary B' . We now bound Pr[Com-bindg, ? ^ Com (ft)]. 

Adversary B' runs almost exactly like B. On the first query (m, j, r^,r 2 ) by 
A, it sets r = rf if i-* 7^ _L and otherwise it sets r uniformly at random; B' then 
sends r to C, receiving back params. 

Let (mi, ji, r*, r 2 ) and (m 2 , J2, r*, r 2 ) be the two queries made by A result- 
ing in a common commitment value. Let (j 1, params, iq, comi, com^, a 1, ay) and 
(j 2 , params, r 2 , comi, com 7 2 , ai//, cr 2 /) be the corresponding signatures resulting 

This extra commitment on a random message is needed for our signed-OT extension 
proof. 
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from Al’s queries. Adversary B' sends (comi, mi, ji, r^, m 2 , J 2 , * 2 ) to its chal- 
lenger and wins with probability one, contradicting the security of the commit- 
ment scheme. Thus, we have that P^Com-bind^Tj^^)] < negl(ft), completing 
the proof. ■ 


3.4 Proof of Security 

We are now ready to prove the security of our signed- OT extension protocol. 
Most of the proof complexity is hidden in the proofs of the associated EU-CMPRA 
signature scheme and commitment scheme. Thus, the signed-OT extension sim- 
ulator is relatively straightforward, and mostly involves parsing the output of 
^"signed ot an d passing the correct values to the adversary. The analysis follows 
almost exactly that of Asharov et al. [6] and thus we elide most of the details. 

Theorem 3. Let 77 = (Gen, Sign, Verify) be the EU-CMPRA signature scheme 
in Sect. 3.3. Then the protocol in Fig. 4 is a secure realization of J-^ Gd0 T in 
the For -hybrid model. 

Proof. We separately consider the case where S is malicious and R is malicious. 
The case where the parties are either both honest or both malicious is straight- 
forward. 


Malicious S . Let Al be a ppt adversary corrupting S. We construct a simulator 
S as follows. 


1. The simulator S acts as an honest R would in Step 1, extracting s from Al’s 
input to F 0T . 

2. The simulator S acts as an honest R would in Steps 2 and 3. 

3. Let I and (j,y“,yj, fete/* for 3 e [ m L be the messages sent 

by A in Step 4. If any of these are invalid, S sends abort to J~£( gnedOT all( l 
simulates R aborting, outputting whatever A outputs. 

4. For j £ [m], proceed as follows. The simulator S extracts xf- <— ® H ( j, q^) 


and xj <- yj © H ( j , q, ® s) , constructs <r* b <- (j, I, k j , (/, (j, y), I, {t jti } ieI 


)), 


for b £ {0,1}, and sends 


O’ 


and cfj i to F^( sne dOT^ receiving back either ((5, m^), or abort. 

5. If S received abort in any of the above iterations, it simulates R aborting, out- 
putting whatever A outputs. Otherwise, for j £ [m], S parses as (j, 7, k j, 

0, U,y b jO, (i, (j, y)~ b n, constructs mess * 

age &j (j,y°,y), {tj,i} ieI , ct'-q, and acts as an honest R would when 
receiving messages 7 and X a j}j e [ m y 

6. The simulator S outputs whatever A outputs. 


It is easy to see that this protocol perfectly simulates a malicious sender since S 
acts exactly as an honest R would (beyond feeding the appropriate messages to 
T n ^ 

** signedOT/ * 

Malicious R . Let Al be a ppt adversary corrupting R. We construct a simulator 
S as follows. 
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1. The simulator S acts as an honest 5 would in Step 1, extracting matrices T 
and V through 5’ s Tot inputs, and thus the values {k j}j e [ m y 

2. The simulator S uses the values extracted above to extract selection bits r 
after receiving the iT values from A in Step 2. 

3. The simulator S acts as an honest S would in Step 3. 

4. Let 7° be the indices at which s (generated in Step 1) is zero, and let 7 C 7° be 
a set of size k. For j G [ra], S sends rj, vk, and 7 to ^iig ne dOT’ receiving back 
(( r i> x ? )- cr i,r J ); S parses a^ rj as (j, I, r, (I, (j, c rj , I, {t jti } ieI )), (/, (j, Ci_ rj . , 

5. In Step 4, S sends 7 and (j, Co, Ci, cr' 0 , cr' ?1 ), for j G [ra], to Al. 

6. The simulator S outputs whatever A outputs. 

The analysis is almost exactly that of the malicious receiver proof in the con- 
struction of Asharov et al. [6]; we thus give an informal security argument here 
and refer the reader to the aforementioned work for the full details. 

A malicious R has two main attacks: using inconsistent choices of its selection 
bits r and trying to cheat in the signature creation in Step 4. This latter attack 
is prevented by the security of our EU-CMPRA signature scheme. The former is 
prevented by the consistency check in Step 3. Namely, Asharov et al. show that 
the consistency check guarantees that: (1) most inputs are consistent with some 
string r, and (2) the number of inconsistent inputs is small and thus allow R 
to only learn a small number of bits of s. Thus, for specific choices of £ and /i, 
the probability of a malicious R cheating is negligible. Asharov et al. provide 
concrete parameters for various settings of the security parameter [6, §3.2]; let 
£' denote the number of base OTs used in their protocol. Now, in our protocol 
we set £ = £' + ft; S leaks ft bits of s when revealing the set 7 in Step 4, and 
so is left with £' unknown bits of s. Thus, the security argument presented by 
Asharov et al. carries over into our setting. ■ 

4 Our Complete PVC Protocol 

As noted above, the main technical challenge of the PVC model is in the signed- 
OT construction and model definitions. The AO protocol in the -T^gnedOT" 
hybrid model is relatively straightforward: the natural (but careful) combina- 
tion of taking a non-halting covert protocol, having the GC generator P\ sign 
appropriate messages, and replacing OTs with signed-OTs works. In particular, 
our signed- OT extension can be naturally modified and used in place of the 
signed-OT primitive in the AO protocol. 

In this section we present a new PVC protocol based on signed-OT extension. 
Our protocol is similar to the AO protocol in the ^f gne dOT"hykrid model, but 
with applying several simple yet very effective optimizations, resulting in a much 
lower communication cost. 

We present our protocol by starting off with the AO protocol and pointing 
out the differences. We presented the AO protocol intuition in the Introduction; 
see Fig. 5 for its formal description; due to lack of space, we omit the (straight- 
forward) Blame and Judgment algorithms. In presenting our changes, we sketch 
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Private inputs: Pl has input xi £ { 0 , l} n and P2 has input X2 £ { 0 , l} n . 
Common inputs: Security parameter ft; XOR-tree replication factor z/; gar- 
bled circuit replication factor A; circuit ( 7 (-,-); commitment scheme TZcom = 
(Com, Open); ideal functionalities PiignedOT and (1)-*^ ignedOT for EU-CMPRA 
signature scheme 77 . 


Pi and P2 define a new circuit (^(xi, x£, . . • , x£) = C(x 1, x|). Let 

w 1, . . . , w n denote the input wires of xi and let , w n +i V denote 

the input wires of x. l 2 . 

For % £ \u — 1 ], P2 chooses x 2 <—$ { 0 , l} n . P2 sets x 2 (©• e ^_ 1 ] x 2 ) ® X2. 

For j £ [A], i £ [n -f- zvn], and b £ { 0 , 1 }, Pi chooses k^ 6 <— $ { 0 , 1 } K . 

Pi and P2 run vn instantiations of P^ignedOTj 
Pi acts as the sender with input (k^ n+ij0 || . . , j|k: 

" Ji/n 1 


where in the zth execution 




Ir 1 


n + iAJ 


and P2 acts as the receiver with input x^ 7 ™ [i mod v\. If P2 , s output is aborti, 
it outputs aborti. 

5 . For j £ [A], Pi constructs garbled circuit GCj of circuit C' , where for i £ [n + 
vn\ the keys for input wire Wi are k^ 0 and k^ Pi sends (GC j, Sign(GCj)) 
to P2, who checks that the signature is valid; if not, P2 outputs aborti. 

6. For i £ [n] and j £ [A], Pi chooses 6^— ${ 0 , 1 }, computes com- 

mitments «,. 0 , o 3 w . fi ) «-* Com(k^, 0 ) and (c^, «-* ComQ^.J, and 

sends (c Wi) b, Sign(c tyi> 6)) and (c^. fb , S\gn(c w .^)) to P2, who checks that the 
signatures are valid; if not, P2 outputs aborti. 

with Pi as the sender inputting 

L ’ {kV. X1 M } ._r 1 ) 




■17 

ignedOT 


7 . Pi and P2 1 

({k^P^}iG[A]\{j},pe[n+i.n],6e{ 0!l } 5 ^°^P’ b ^e[A]\{j},pC[n],be{0,l} 5 X ^ w i^i[' i '] j ie[n] / 

as its jth input and P2 as the receiver inputting 7^—$ [A] as its input; if p2’s 
output is aborti, it outputs aborti. 

8. P2 does the following: 

— For j £ [A]\{7}, i £ [n], and b £ {0,1}, P 2 checks that Open(c^. b , 

°w- b ) = k^. b . If not, P2 sets key i— InvalidDecommitment and moves to 

Step 9 . 

— For j £ [A] \{7} , P2 uses the input wire keys received from the signed-OT 
in Step 7 to check that GCj is a correctly garbled circuit. If not, P2 sets 
key <— Invalid Circuit and moves to Step 9 . 

— For j £ [A]\{7} 5 P2 checks that the keys received in the signed-OT 

in Step 4 match the keys sent by Pi in Step 7 . If not, P2 sets key <— 

SelectiveOTAttack and moves to Step 9 . 

9 . If any of the above checks fail, P2 computes Cert «— Blame(idi, key, View2), 
publishes Cert, and outputs corrupted^ Otherwise, P2 uses the keys to com- 
pute C'(x.i,x 2 , ...5X2) and outputs the result. 


Fig. 5 . The AO PVC protocol [ 7 , Protocol 3 ]. 


the improvement each of them brings. Thus, we start by reviewing the commu- 
nication cost of the AO protocol. 

Communication Cost of the AO Protocol. Using state-of-the-art optimiza- 
tions [13,19,20], the size of each GC sent in Step 5 is 2k\Gc\, where \Gc\ is 
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the number of non-XOR gates in circuit C (note that \Gc\ = \Gc\ for cir- 
cuit C' generated in Step 1 since the XOR-tree only adds XOR gates to the 
circuit, which are “free” [13]). Let r be the field size (in bits), v the XOR- 
tree replication factor, A the GC replication factor, and n the length of the 
inputs, and assume that each signature is of length r and the commitment and 
decommitment values are of length k. Using the signed- OT instantiations of 
Asharov and Orlandi [7, Protocols 1 and 2] , we get a total communication cost 
of r(7vn + 11) + 2\kvti + £{2k\Gc\ + r) + 2ti\(k + r) + r( 3 + 2A + 11(A — 1)) + 
A/«(2(n + vn)( A — 1) + 2n(A — 1) + n). 

As an example, consider the secure computation of AES(m, k), where Pi 
128 128 

inputs message m E {0, 1} and P 2 inputs key k G {0,1} , and suppose we 

set both the GC replication factor A and the XOR-tree replication factor v to 3, 
giving a cheating probability of e = 1/2. Letting k, = 128 and r = 256, we have 
a total communication cost of 9.3 Mbit (where we assume that the AES circuit 
has 9,100 non-XOR gates [15]). 

Our Modifications. We make the following modifications to the AO protocol: 

- In Step 6 , instead of using a commitment scheme we can use a hash function. 
This saves on communication in Step 7 as Pi no longer needs to send the 
openings {o^ 5 } to the commitments in the signed-OT, and is secure when 
treating H as a random oracle since the keys are generated uniformly at 
random and thus it is infeasible for P 2 to guess the committed values. The 
total savings are 2 n(A — 1 )^A bits; in our example, this saves us 196 kbit. 

- In Step 3, we use a random seed to generate the input wire keys. Namely, 
for all j E [A] we compute sy < — $ {0, 1}^, and compute the input wire keys 
for circuit j as k^Hk^H • • • \\^ 3 Wn+unfi \\K n+ . n ,i G ( s i)> where G is a 
pseudorandom generator. Now, in the 1-out-of-A signed-OT in Step 7 we can 
just send the seeds to the input wire keys rather than the input wire keys 
themselves. The total savings are 2 (n + z/n)(A — l)Att — n(A — l)Aft bits; in our 
example, this saves us 688 kbit. 

- In Step 5, Pi generates each GC j from a seed s J GC . (This idea was first put 

forward by Goyal et al. [ 11 ].) That is, s J GC specifies the randomness used 
to construct all wire keys except for the input wire keys which were set in 
Step 3. Instead of Pi sending each GC to P 2 in Step 5, Pi instead sends a 
commitment c GG <— H(GCj). Now, in Step 7, P\ can send the appropriate 
seeds { s Gc}je[A]\{j} j th input of the 1-out-of-A signed-OT to allow P 2 

to check the correctness of the check GCs. We then add an additional step 
where, if the checks pass, Pi sends GC 7 (along with a signature on GC 7 ) to 
P 2 , who can check whether P(GC 7 ) = Cq C . Note that this does not violate 
the security conditions required by the PVC model because P 2 catches any 
cheating of Pi before the evaluation circuit is sent. If Pi tries to cheat here, 
P 2 already has a commitment to the circuit so can detect any cheating. The 
total savings are (A — 1)2 k\Gc\ — At — Aft(A — 1 ) bits; in our example, this 
saves us 4.6 Mbit. 
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Private inputs: Pi has input xi G {0, l} n ; P 2 has input X2 G {0, l} n . 
Common inputs: Security parameter k; XOR-tree replication factor v; garbled 
circuit replication factor A; circuit C(-,-); hash function H : {0,1}* {0, 1} K ; 

pseudorandom generator G : {0, 1} K — ► {0, i} 2 ( n + z/n )^. ideal functionalities 
A^gnedOT an d (^-^iignedOT for EU-CMPRA signature scheme 77. 

1. Pi and P 2 define a new circuit C^xi, xj, . . . , x£) — C(xi, © ie ^ X2). Let 
wi,. . . ,w n denote the input wires of xi and let w n+ ^_ 1 ^, . . . , w n +iv denote 
the input wires of x|. 

2. For z G [1/ — 1], P 2 chooses x?> G- $ {0, l} n and sets X2 G- (©*£[„_!] x s>) © x 2- 

3. For j G [A], Pi chooses Sj <— $ {0, 1} K and computes kjy 0 ||k^, -JI • • • || 

k ™ n+ , n ,oll k ™ n+I , n ,l ^( S l)- 

4. Pi and P2 run isn instantiations of P^gnedOT? where in the zth execution 

Pi acts as the sender with input (kjy >0 || • • • ||k* ;0 , ki, ; i || • • • ||k* i} i) 

and P2 acts as the receiver with input xJ^ 7 ^ [i mod v\. If Pi s output is abort z , 
it outputs aborts 

5. For j G [A], Pi computes Sq C ^— ${ 0, 1} K and uses s J GC as the randomness 
used to generate garbled circuit GCj, where for i G [n + isn] the keys for 
input wire Wi are \P W . 0 and k. J w . 1 . Pi computes c J GC g- H(GCj ) and sends 
( C GC ’ Sign( c gc)) to ^2, who checks that the signature is valid; if not, P2 
outputs aborti. 

6. For i G [n] and j G [A], Pi computes c° w . Q G- H(k J w . 0 ) and c° Wi l G- 77( k^. ;1 ), 
and sends (c Wii b,S\gr\(c Wi , b )), (c Wi £,S\gr\(c w .j)) to P 2 , where 6^— ${0,1}. P 2 
checks that the signatures are valid; if not, P2 outputs aborti. 

7. Pi and P2 run (^)-Pjig n edOT with Pi as the sender and P2 as the receiver. 

P 2 uses 7 $ [A] as its input and Pi uses ({s*, SGc}ie[\]\{j}i {k^. jXl[i] }i6[n]) 

as its jth input. If Pi s output is aborts, it outputs aborts 

8. P2 does the following: 

— For j G [A] \{"y}, i G [n], and 6 G {0, 1}, P2 checks that H(k J w , b ) = c J w . b . 

If not, P2 sets key G- InvalidDecommitment and moves to Step 12. 

— For j G [A]\{7}, P2 uses s j and s^ c received from (^)-Pi? gne dOT 1° check 
that GC j is a correctly garbled circuit and that H(GCj) = c GG . If not, 
P2 sets key G- InvalidCircuit and moves to Step 12. 

— For j G [A]\{y}, P2 checks that the keys received in Pi{ gne dOT match 
the keys generated by s j received in Step 7. If not, P2 sets key G- 
SelectiveOTAttack and moves to Step 12. 

9. Let ((7, m 7 ),cr) be P2 5 s output of (^)-PiignedOT- P 2 sends (7, a) to Pi, who 
checks that the signature is valid and otherwise outputs abort2. 

10. Pi sends (GC 7 , Sign(GC 7 )) to P2, who checks that the signature is valid; if 
not, P2 outputs aborti. 

11. P2 checks that T7(GC 7 ) = Cq C . If not, P2 sets key g- InvalidCircuitHash and 
moves to Step 12. 

12. If any of the above checks fail, P2 computes Cert G- Blame(idi, key, View2), 
publishes Cert, and outputs corrupted 1 . Otherwise, P2 uses the keys to com- 
pute C^xi, X2 , . . . , X2) and outputs the result. 


Fig. 6. Our PVC protocol. 


230 


V. Kolesnikov and A.J. Malozemoff 


Our PVC Protocol and Its Cost. Fig. 6 presents our optimized protocol. 
For simplicity, we sign each message in Steps 5 and 6 separately; however, we 
note that we can group all the messages in a given step into a single signature 
(cf. Observation 2). The Blame and Judgment algorithms are straightforward 
and similar to the AO protocol (Blame outputs the relevant parts of the view, 
including the cheater’s signatures, and Judgment checks the signatures). We 
prove the following theorem in the full version. 

Theorem 4. Let X < p(n) and v < p(k), for some polynomial p(-), be parame- 
ters to the protocol, and set e = (1 — 1 / A) (1 — 2 -zy+1 ). Let f be a ppt function, let 
H be a random oracle, let ^ r ^ nedOT and (*) '^iignedOT ^ e e (J) -signed- OT 
and (* ) -signed-OT ideal functionalities, respectively, where LI is an EU-CMPRA 
signature scheme. Then the protocol in Fig. 6 securely computes f in the pres- 
ence of (1) an e-PVC adversary corrupting Pi and (2) a malicious adversary 
corrupting P^. 

Using our AES circuit example, we find that the total communication cost is 
now 2.5 Mbit, plus the cost of signed-OT/signed-OT extension. In this particu- 
lar example, signed-OT requires around 1 Mbit and signed-OT extension requires 
around 1.4 Mbit. However, as we show below, as the number of OTs required 
grows, signed-OT extension quickly outperforms signed-OT, both in communi- 
cation and computation. 

5 Comparison with Prior Work 

We now compare our signed-OT extension construction (including optimizations, 
and in particular, the signature batching of Observation 2) with the signed-OT 
protocol of Asharov and Orlandi [7], along with a comparison of existing covert 
and malicious protocols and our PVC protocol using both signed-OT and signed- 
OT extension. All comparisons are done through calculating the number of bits 
transferred and estimated running times based on the relative cost of public-key 
versus symmetric-key operations. We use a very conservative (low-end) estimate 
on the public/symmetric speed ratio. We note that this ratio does vary greatly 
across platforms, being much higher on low power mobile devices, which often 
employ a weak CPU but have hardware AES support. For such platforms our 
numbers would be even better. 

Recall that r is the field size (in bits), v is the XOR-tree replication factor, 
A is the GC replication factor, n is the input length, and we assume that each 
signature is of length r. 

Communication Cost. We first focus on the communication cost of the two 
protocols. The signed-OT protocol of Asharov and Orlandi [7] is based on the 
maliciously secure OT protocol of Peikert et al. [18], and inherits similar costs. 
Namely, the communication cost of executing £ OTs each of length n is (6£+ll)r 
if n < r, and (6£ + ll)r + 2n£ if n > r. Signed-OT requires the additional 
communication of a signature per OT, adding an additional r£ bits. In the 
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underlying secure computation protocol we have that n = Aft, where A is the 
garbled circuit replication factor. For simplicity, we set A = 3 (which along with 
an XOR-tree replication factor of three equates to a deterrence factor of e = 1/2) 
and thus n = 3k. Thus, the total communication cost of executing t signed-OTs 
is r(7t + ll) bits if 3k < r and T(7t + ll)+6«t bits otherwise. 

On the other hand, the cost of signed-OT extension for t OTs is (6£ + ll)r + 
2 £t + £t + fi£ log £ + 4 ]±£k + k log £ + (n + K)t + r. Asharov et al. [6, §3.2] present 
concrete choices of /i and £ for various security parameters. However, in our 
setting we need to increase £ by k bits. Thus, let £' be the particular choice 
of £ specified by Asharov et al. We then set £ = £' + k. Thus, for short secu- 
rity parameter we set £ = 133 + 80 = 213 and fi = 3, and for long security 
parameter we set £ = 190 + 128 = 318 and /x = 2. Thus, the total com- 
munication cost of executing t signed-OTs when using signed-OT extension is 
(6£ + 12 )t + ( 3£ + n + ^)t + fiilogi + 4 iiin + /Hog i bits. 


Security 


1,000 OTs 


10,000 OTs 

sOT 

sOT-ext Improvement 

sOT 

sOT-ext Improvement 

Short (FFC) 

7,179 

2,539 

2.8x 

71,691 

11,305 

6.3x 

Short (ECC) 

1,602 

1,398 

1.1 X 

16,002 

10,164 

1.6x 

Long (FFC) 

21,538 

7,694 

2.8x 

215,074 

20,888 

10. 3x 

Long (ECC) 

2,563 

2,288 

1.1 X 

25,603 

15,482 

1.7x 


Fig. 7. Communication cost (in kbits) of transferring the input wire labels for P 2 when 
using signed-OT (sOT) versus signed-OT extension (sOT-ext) for 1,000 and 10,000 
OTs. 


Figure 7 presents a comparison of the communication cost of both approaches 
when executing 1,000 and 10,000 OTs, for various keylength settings and under- 
lying public-key cryptosystems. We see improvements from 1. 1-10.3 x , depending 
on the number of OTs, the underlying public-key cryptosystem, and the size of 
the security parameter. Note that for a smaller number of OTs (such as 100), 
signed-OT is more efficient, which makes sense due to the overhead of OT exten- 
sion and the need to compute the base OTs. However, as the number of OTs 
grows, we see that signed-OT extension is superior across the board. 

Computational Cost. We now look at the computational cost of the two pro- 
tocols. Let £ denote the cost of a public-key operation (we assume exponentia- 
tions and signing take the same amount of time), and let £ denote the cost of 
a symmetric-key operation (where we let £ denote the cost of operating over k 
bits; e.g., hashing a 2ft-bit value costs 2£). We assume all other operations are 
“free” . This is obviously a very coarse analysis; however, it gives a general idea 
of the performance characteristics of the two approaches. 

The cost of executing £ OTs on n-bit messages is (1A£ + 12)£ if n < r and 
(14^ + 12)£ + 2£^( if n > r. Signed-OT requires an additional 2^£ operations 
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(for signing and verifying). We again set n = 3 ft, and thus the cost of executing 
t signed-OTs is (16t + 12)£ if 3 ft < r and (16t + 12)£ + 6t£ otherwise. 

The cost of our signed- OT extension protocol for t OTs (where we assume 
t > ft and we hash the input prior to signing in Step 4) is ^t( + (14£ + 12)£ + 
+ 6^/i^C + 2 log£ + 2 ^ + ^ +/€ C + 2£. As above, we set £ = 213 and gw 3 for 
short security parameter, £ = 318 and fi = 2 for long security parameter, and 
n = 3 ft. Thus, the cost of executing t signed-OTs is (14£ + 14)£+((5 + 6 fi)^ 
+8)tC + 21og££. 


1,000 OTs 10,000 OTs 


Security 

sOT sOT-ext Improvement 

sOT 

sOT-ext Improvement 

Short (FFC) 

16.0 

3.1 

5. lx 

160.0 

3.8 

42. 4x 

Short (ECC) 

5.3 

1.1 

4.9x 

53.3 

1.7 

30. 9x 

Long (FFC) 

144.1 

40.2 

3.6x 

1440.1 

40.7 

35. 4x 

Long (ECC) 

14.4 

4.1 

3.5x 

144.1 

4.5 

31. 9x 


Fig. 8. Computation cost (in millions of “time units”) of transferring the input wire 
labels for P 2 when using signed-OT (sOT) versus signed-OT extension (sOT-ext) for 
1,000 and 10,000 OTs. We assume symmetric-key operations take 1 “time unit”, FFC 
(resp., ECC) operations take 1000 (resp., 333) “time units” for the short security 
parameter, and FFC (resp., ECC) operations take 9000 (resp., 900) “time units” for 
the long security parameter [1]. 


Figure 8 presents a comparison of the computational cost of both approaches 
when executing 1,000 and 10,000 OTs, for various keylength settings and under- 
lying public-key cryptosystems. Here we see that regardless of the number of 
OTs and public-key cryptosystem used, signed-OT extension is (often much) 
more efficient, and as the number of OTs increases so does this improvement. 
For as few as 1,000 OTs we already see a 3. 5-5. lx improvement, and for 10,000 
OTs we see a 30.9-42.4 x improvement. 

Comparing Covert, PVC, and Malicious Protocols. We now compare 
the computation cost of our PVC protocol in Fig. 6, using both signed-OT and 
signed-OT extension, with the covert protocol of Goyal et al. [11] and the mali- 
cious protocol of Lindell [IT] 7 . 

Figure 9 presents a comparison of the computation cost of our protocol using 
both signed-OT (0urs sOT ) and signed-OT extension (0urs sOT-ext ), as well as 
comparisons to the Goyal et al. protocol (GMS) and Lindell protocol (Lin). Due 
to lack of space, the detailed cost formulas appear in the full version. We fix 
ft = 128, A = v = 3 (giving a deterrence factor of e = 1/2), and assume the 

7 Lindell’s malicious protocol can also be adapted into a covert protocol; however, we 
found that the computation cost is much more than that of Goyal et al., at least for 
deterrence factor 1/2. 
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f 

# inputs 

# gates 

GMS 

Ours sOT 

Lin 

OursS^T-ext 

OurssOT-ext 

Ours s< - ) T-ext 

16384-bit Comp. 

16,384 

32,229 

0.85-0.73 

17.1-86.7 

357.0-1887.2 

Hamming 16000 

16,000 

97,175 

0.90-0.79 

11.0-67.0 

224.7-1408.4 

16x16 Matrix Mult. 

8192 

4,186,368 

1.00-0.98 

1. 2-3.1 

14.2-54.3 

1024-bit Sum 

1,024 

2,977 

0.71-0.61 

6.7-10.2 

166.6-258.2 

1024-bit Mult. 

1,024 

6,371,746 

1.00-0.99 

1.0-1. 2 

10.1-13.9 

1024-bit RSA 

1,024 15,149,856,895 

1.00-1.00 

1.0-1.0 

9. 6-9. 6 


Fig. 9. Ratio of computation cost of various secure computation protocols with our 
signed-OT extension construction, using a deterrence factor of 1/2 for the covert and 
PVC protocols. GMS denotes the covert protocol of Goyal et al. [11], Ours sOT denotes 
the optimized Asharov-Orlandi protocol run using signed-OT, Ours sOT ~ e:>ct denotes 
the same protocol using signed-OT extension, and Lin denotes Lindell’s malicious 
protocol [17]. We let / denote the function being computed, # inputs denote the 
number of input bits required as input by P 2 , and # gates denote the number of 
non-XOR gates in the resulting circuit. All circuit information is taken from the PCF 
compiler [14, Table5]. We report each ratio as a range; the first number uses £ as 125 
as the cost of public- key operations and the second number uses £ = 1250, where we 
assume a symmetric- key operation costs £ = 1. 


use of elliptic curve cryptography (and thus r = 256). We expect public-key 
operations to take between 125-1250 x more than symmetric-key operations, 
depending on implementation details, whether one uses AES-NI, etc. This range 
is a very conservative estimate using the Crypto++ benchmark [2] , experiments 
using OpenS SL, and estimated ratios of running times between finite field and 
elliptic curve cryptography [1]. 

When comparing against GMS, we find that Ours sOT-ext is slightly more 
expensive, due almost entirely to the larger number of base OTs in the signed-OT 
extension. We note that in practice, however, a deterrence factor of 1/2 may not 
be sufficient for a covert protocol but may be sufficient for a PVC protocol, due 
to the latter’s ability to “name- and- shame” the perpetrator. When increasing 
the deterrence factor for the covert protocol to e « .9, the cost ratios favor 
Ours sOT_ext . For example, for 16x16 matrix multiplication, the ratio becomes 
3.60-3.53x, depending on the cost of public-key operations (versus 1.00-0.98x). 

Comparing Ours sOT_ext with Ours sOT , we find that the former is 1.0-86.7x 
more efficient, depending largely on the characteristics of the underlying circuit. 
For circuits with a large number of inputs but a relatively small number of gates 
(e.g., 16384-bit Comp., Hamming 16000, and 1024-bit Sum) this difference is 
greatest, which makes sense, as the cost of the OT operations dominates. The 
circuits for which the ratio is around 1.0 (e.g., 1024-bit RSA) are those that have 
a huge number of gates compared to the number of inputs, and thus the cost of 
processing the GC far outweighs the cost of signed- OT/signed-OT extension. 

Finally, comparing Ours sOT-ext with Lin, the former is 9.6-1887.2 x more 
efficient, again depending in a large part on the characteristics of the circuit. 
We see that for circuits with a large number of inputs this difference is starkest; 
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e.g., for the Hamming 16000 circuit, we get an improvement of 224. 7-1408. 4x. 
The reason we see such large improvements for these circuits is that Lin requires 
cut-and-choose oblivious transfer, which cannot take advantage of OT extension. 
Thus, the number of public-key operations is huge compared to the circuit size, 
and this cost has a large impact on the overall running time. Note, however, that 
even for circuits where the number of gates dominates, we still see a relatively 
significant improvement (e.g., 14. 2-54. 3x for 16x16 Matrix Mult.). These results 
demonstrate that for settings where public shaming is enough of a deterrent from 
cheating, Ours sOT-ext presents a better choice than malicious protocols. 
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Abstract. Extractability, or “knowledge,” assumptions have recently 
gained popularity in the cryptographic community, leading to the study 
of primitives such as extractable one-way functions, extractable hash 
functions, succinct non-interactive arguments of knowledge (SNARKs), 
and (public-coin) differing- inputs obfuscation ((PC -)diO), and spurring 
the development of a wide spectrum of new applications relying on 
these primitives. For most of these applications, it is required that the 
extractability assumption holds even in the presence of attackers receiv- 
ing some auxiliary information that is sampled from some fixed efficiently 
computable distribution Z. 

We show that, assuming the existence of public-coin collision-resistant 
hash functions, there exists an efficient distributions Z such that either 

- PC -diO for Turing machines does not exist, or 

- extractable one-way functions w.r.t. auxiliary input Z do not exist. 

A corollary of this result shows that additionally assuming existence of 
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- SNARKs for NP w.r.t. auxiliary input Z do not exist, or 
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To achieve our results, we develop a “succinct punctured program” tech- 
nique, mirroring the powerful punctured program technique of Sahai and 
Waters (STOC’14), and present several other applications of this new 
technique. In particular, we construct succinct perfect zero knowledge 
SNARGs and give a universal instantiation of random oracles in full- 
domain hash applications, based on PC -diO. 

As a final contribution, we demonstrate that even in the absence of 
auxiliary input , care must be taken when making use of extractability 
assumptions. We show that (standard) diO w.r.t. any distribution T> over 
programs and bounded-length auxiliary input is directly implied by any 
obfuscator that satisfies the weaker indistinguishability obfuscation (i O) 
security notion and diO for a slightly modified distribution V' of pro- 
grams (of slightly greater size) and no auxiliary input. As a consequence, 
we directly obtain negative results for (standard) diO in the absence of 
auxiliary input. 


1 Introduction 

Extractability Assumptions. Extractability, or “knowledge,” assumptions (such 
as the “knowledge-of-exponent” assumption), have recently gained in popu- 
larity, leading to the study of primitives such as extractable one-way func- 
tions, extractable hash-functions, SNARKs (succinct non-interactive arguments 
of knowledge), and differing-inputs obfuscation: 

- Extractable OWF: An extractable family of one-way (resp. collision- 
resistant) functions [14,15,27], is a family of one-way (resp. collision-resistant) 
functions {fi} such that any attacker who outputs an element y in the range 
of a randomly chosen function fi given the index i must “know” a pre-image 
x of y (i.e., fi(x) = y). This is formalized by requiring for every adversary A, 
the existence of an “extractor” £ that (with overwhelming probability) given 
the view of A outputs a pre-image x whenever A outputs an element y in the 
range of the function. 

For example, the “knowledge-of-exponent” assumption of Damgard [15] stip- 
ulates the existence of a particular such extractable one-way function. 

- SNARKs: Succinct non-interactive arguments of knowledge (SNARKs) 
[5,32,35] are communication-efficient (i.e., “short” or “succinct”) arguments 
for NP with the property that if a prover generates an accepting (short) proof, 
it must “know” a corresponding (potentially long) witness for the statement 
proved, and this witness can be efficiently “extracted” out from the prover. 

- Differing-inputs Obfuscation: [1,2,10] A differing-inputs obfuscator O for 
program-pair distribution V is an efficient procedure which ensures if any 
efficient attacker A can distinguish obfuscations 0{C\) and 0 ( 62 ) of programs 
C 1 , C 2 generated via V given the randomness r used in sampling, then it must 
“know” an input x such that C\ ( x ) 7 ^ C 2 {x ) , and this input can be efficiently 
“extracted” from A. 
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A recently proposed (weaker) variant known as public- coin differing-inputs 
obfuscation [30] additionally provides the randomness used to sample the pro- 
grams (Co, Ci) <— V to the extraction algorithm (and to the attacker A). 

The above primitives have proven extremely useful in constructing cryptographic 
tools for which instantiations under complexity-theoretic hardness assumptions 
are not known (e.g., [1,5,10,16,24,27,30]). 

Extraction with (Distribution- Specific) Auxiliary Input. In all of these applica- 
tions, we require a notion of an auxiliary -input extractable one-way function 
[14,27], where both the attacker and the extractor may receive an auxiliary 
input. The strongest formulation requires extractability in the presence of an 
arbitrary auxiliary input. Yet, as informally discussed already in the original 
work by Hada and Tanaka [27], extractability w.r.t. an arbitrary auxiliary input 
is an “overly strong” (or in the language of [27], “unreasonable”) assumption. 
Indeed, a recent result of Bitansky, Canetti, Rosen and Paneth [7] (formalizing 
earlier intuitions from [5,27]) demonstrates that assuming the existence of indis- 
tinguishability obfuscators for the class of polynomial-size circuits 1 there cannot 
exist auxiliary- input extractable one-way functions that remain secure for an 
arbitrary auxiliary input. 

However, for most of the above applications, we actually do not require 
extractability to hold w.r.t. an arbitrary auxiliary input. Rather, as proposed 
by Bitansky et al. [5,6], it often suffices to consider extractability with respect 
to specific distributions Z of auxiliary input. 2 More precisely, it would suf- 
fice to show that for every desired output length 1(f) and distribution Z there 
exists a function family Tz (which, in particular, may be tailored for Z) such 
that Tz is a family of extractable one-way (or collision-resistant) functions 
{0, l} k —> {0, 1 }t(k) respect to Z. In fact, for some of these results (e.g., 
[5,6]), it suffices to just assume that extraction works for just for the uniform 
distribution. 

In contrast, the result of [7] can be interpreted as saying that (assuming iO), 
there do not exist extractable one-way functions with respect to every distribu- 
tion of auxiliary input: That is, for every candidate extractable one-way function 
family T, there exists some distribution Zj? of auxiliary input that breaks it. 

1 The notion of indistinguishability obfuscation [2] requires that obfuscations 0(C\) 
and 0(C2) of any two equivalent circuits C\ and C 2 (i.e., whose outputs agree on 
all inputs) from some class C are computationally indistinguishable. A candidate 
construction for general-purpose indistinguishability obfuscation was recently given 
by Garg et al. [18]. 

2 As far as we know, the only exceptions are in the context of zero-knowledge simula- 
tion, where the extractor is used in the simulation (as opposed to being used as part 
of a reduction), and we require simulation w.r.t. arbitrary auxiliary inputs. Neverthe- 
less, as pointed out in the works on zero-knowledge [26,27], to acheive “plain” zero- 
knowledge [3,25] (where the verifier does not receive any auxiliary input), weaker 
“bounded” auxiliary input assumptions suffice. 
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Our Results. In this paper, we show limitations of extractability primitives with 
respect to distribution- specific auxiliary input (assuming the existence of public- 
coin collision-resistant hash functions (CRHF) [29]). Our main result shows a 
conflict between public-coin differing-inputs obfuscation for Turing machines [30] 
and extractable one-way functions. 

Theorem 1 (Main Theorem — Informal). Assume the existence of public- 
coin collision-resistant hash functions. Then for every polynomial t, there exists 
an efficiently computable distribution Z such that one of the following two prim- 
itives does not exist: 

- extractable one-way functions {0, 1}^ — > {0, w.r.t. auxiliary input from Z. 

- public- coin differing-inputs obfuscation for Turing machines. 

By combining our main theorem with results from [5,30], we obtain the 
following corollary: 

Theorem 2 (Informal). Assume the existence of public- coin CRHF and fully 
homomorphic encryption with decryption in NC 1 . 3 Then there exists an effi- 
ciently computable distribution Z such that one of the following two primitives 
does not exist: 

- SNARKs w.r.t. auxiliary input from Z. 

- public- coin differing-inputs obfuscation for NC 1 circuits. 

To prove our results, we develop a new proof technique, which we refer to as the 
“succinct punctured program” technique, extending the “punctured program” 
paradigm of Sahai and Waters [34]; see Sect. 1.1 for more details. This technique 
has several other interesting applications, as we discuss in Sect. 1.3. 

As a final contribution, we demonstrate that even in the absence of auxil- 
iary input , care must be taken when making use of extractability assumptions. 
Specifically, we show that differing-inputs obfuscation (diO) for any distribu- 
tion V of programs and bounded-length auxiliary inputs, is directly implied 
by any obfuscator that satisfies a weaker indistinguishability obfuscation (iO) 
security notion (which is not an extractability assumption) and diO security 
for a related distribution V of programs (of slightly greater size) which does 
not contain auxiliary input. Thus, negative results ruling out existence of diO 
with bounded-length auxiliary input directly imply negative results for diO in a 
setting without auxiliary input. 

Theorem 3 (Informal). Let V be a distribution over pairs of programs and 
i-bounded auxiliary input information V x V x {0,1}^. There exists diO with 
respect to V if there exists an obfuscator satisfying iO in addition to diO with 
respect to a modified distribution V' over V ' x V' for slightly enriched program 
class V f , and no auxiliary input. 


3 As is the case for nearly all existing FHE constructions (e.g., [13,21]). 
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Our transformation applies to a recent result of Garg et al. [20] , which shows 
that based on a new assumption (pertaining to special-purpose obfuscation of 
Turing machines) general-purpose diO w.r.t. auxiliary input cannot exist, by 
constructing a distribution over circuits and bounded-length auxiliary inputs for 
which no obfuscator can be diO-secure. Our resulting conclusion is that, assum- 
ing such special-purpose obfuscation exists, then general-purpose diO cannot 
exist, even in the absence of auxiliary input. 

We view this as evidence that public-coin differing inputs may be the “right” 
approach definitionally, as restrictions on auxiliary input without regard to the 
programs themselves will not suffice. 

Interpretation of Our Results. Our results suggest that one must take care 
when making extract ability assumptions, even in the presence of specific distri- 
butions of auxiliary inputs, and in certain cases even in the absence of auxiliary 
input. In particular, we must develop a way to distinguish “good” distributions of 
instances and auxiliary inputs (for which extractability assumptions may make 
sense) and “bad” ones (for which extractability assumptions are unlikely to hold). 
As mentioned above, for some applications of extractability assumptions, it in 
fact suffices to consider a particularly simple distribution of auxiliary inputs — 
namely the uniform distribution. 4 We emphasize that our results do not present 
any limitations of extractable one-way functions in the presence of uniform aux- 
iliary input, and as such, this still seems like a plausible assumption. 

Comparison to [20]. An interesting subsequent 5 work of Garg et al. [19,20] 
contains a related study of differing- inputs obfuscation. In [20], the authors pro- 
pose a new “special-purpose” circuit obfuscation assumption, and demonstrate 
based on this assumption an auxiliary input distribution (whose size grows with 
the desired circuit size of circuits to be obfuscated) for which general-purpose 
diO cannot exist. Using similar techniques of hashing and obfuscating Turing 
machines as in the current work, they further conclude that if the new obfusca- 
tion assumption holds also for Turing machines , then the “bad” auxiliary input 
distribution can have bounded length (irrespective of the circuit size). 

Garg et al. [20] show the “special-purpose” obfuscation assumption is a fal- 
sifiable assumption (in the sense of [33]) and is implied by virtual black-box 
obfuscation for the relevant restricted class of programs, but plausibility of the 
notion in relation to other primitives is otherwise unknown. In contrast, our 
results provide a direct relation between existing, studied topics (namely, diO, 
EOWFs, and SNARKs). Even in the case that the special-purpose obfuscation 
assumption does hold, our primary results provide conclusions for public-coin 
diO , whereas Garg et al. [20] consider (stronger) standard diO , with respect to 
auxiliary input. 


4 Note that this is not the case for all applications; e.g. [11,23,26,27] require consid- 
ering more complicated distributions. 

5 A version of our paper with Theorems 1 and 2 for (standard) differing- inputs obfus- 
cation in the place of public-coin diO has been on ePrint since October 2013 [12]. 
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And, utilizing our final observation (which occurred subsequent to [20]), we 
show that based on their same special-purpose obfuscation assumption for Turing 
machines, we can in fact rule out general-purpose diO for circuits even in the 
absence of auxiliary input. 


1.1 Proof Techniques 

To explain our techniques, let us first explain earlier arguments against the 
plausibility of extractable one-way functions with auxiliary input. For simplicity 
of notation, we focus on extractable one-way function over {0, l} k —> {0, l} k (as 
opposed to over {0, l} k — > {0, 1 for some polynomial £), but emphasize that 
the approach described directly extends to the more general setting. 

Early Intuitions. As mentioned above, already the original work of Hada and 
Tanaka [27], which introduced auxiliary input extractable one-way functions 
(EOWFs) (for the specific case of exponentiation), argued the “unreasonable- 
ness” of such functions, reasoning informally that the auxiliary input could con- 
tain a program that evaluates the function, and thus a corresponding extractor 
must be able to “reverse-engineer” any such program. Bitansky et al. [5] made 
this idea more explicit: Given some candidate EOWF family T , consider the 
distribution Z? over auxiliary input formed by “obfuscating” a program II s (•) 
for uniformly chosen s, where II s (•) takes as input a function index e from the 
alleged EOWF family T = {fa}, applies a pseudorandom function (PRF) with 
hardcoded seed s to the index i, and then outputs the evaluation fa(PRF s (i)). 
Now, consider an attacker A who, given an index i, simply runs the obfuscated 
program to obtain a “random” point in the range of fa. If it were possible to 
obfuscate II s in a “virtual black-box (VBB)” way (as in [2]), then it easily fol- 
lows that any extractor £ for this particular attacker A can invert fa . Intuitively, 
the VBB-obfuscated program hides the PRF seed s (revealing, in essence, only 
black-box access to II s ), and so if £ can successfully invert fa on A’s output 
fa(PRF s (i)) on a pseudorandom input PRF s (i), he must also be able to invert 
for a truly random input. Formally, given an index i and a random point y in 
the image of fa, we can “program” the output of II s (i) to simply be y, and thus 
E will be forced to invert y. 

The problem with this argument is that (as shown by Barak et al. [2]), for 
large classes of functions VBB program obfuscation simply does not exist. 

The Work of [7] and the Punctured Program ” Paradigm of [34]. Intriguingly, 
Bitansky, Canetti, Rosen and Paneth [7] show that by using a particular PRF 
and instead relying on indistinguishability obfuscation, the above argument still 
applies! To do so, they rely on the powerful “punctured-program” paradigm of 
Sahai and Waters [34] (and the closely related work of Hohenberger, Sahai and 
Waters [28] on “instantiating random oracles”). Roughly speaking, the punc- 
tured program paradigm shows that if we use indistinguishability obfuscation 
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to obfuscate a (function of) a special kind of “puncturable” PRF 6 [8,11,31], 
we can still “program” the output of the program on one input (which was 
used in [28,34] to show various applications of indistinguishability obfuscation). 
Bitansky et al. [7] show that by using this approach, then from any alleged 
extractor £ we can construct a one-way function inverter Inv by “program- 
ming” the output of the program II s at the input i with the challenge value y. 
More explicitly, mirroring [28,34], they consider a hybrid experiment where £ is 
executed with fake (but indistinguishable) auxiliary input, formed by obfuscat- 
ing a “punctured” variant II? y of the program II s that contains an i-punctured 
PRF seed s* (enabling evaluation of PRF s (j) for any j ^ i) and directly outputs 
the hardcoded value y := fi( PRF s (i)) on input i : indistinguishability of this aux- 
iliary input follows by the security of indistinguishability obfuscation since the 
programs II? y and II s are equivalent when y = fi( PRF s (i)) = II s (i). In a sec- 
ond hybrid experiment, the “correct” hardcoded value y is replaced by a random 
evaluation fi(u) for uniform u ; here, indistinguishability of the auxiliary inputs 
follows directly by the security of the punctured PRF. Finally, by indistinguisha- 
bility of the three distributions of auxiliary input in the three experiments, it 
must be that £ can extract an inverse to y with non-negligible probability in each 
hybrid; but, in the final experiment this implies the ability to invert a random 
evaluation, breaking one-wayness of the EOWF. 

The Problem: Dependence on T . Note that in the above approach, the auxiliary 
input distribution is selected as a function of the family T = {fj} of (alleged) 
extractable one-way functions. Indeed, the obfuscated program II s must be able 
to evaluate fj given j. One may attempt to mitigate this situation by instead 
obfuscating a universal circuit that takes as input both T and the index j, 
and appropriately evaluates fj. But here still the size of the universal circuit 
must be greater than the running time of /j, and thus such an auxiliary input 
distribution would only rule out EOWFs with a-priori bounded running time. 
This does not suffice for what we aim to achieve: in particular, it still leaves open 
the possibility that for every distribution of auxiliary inputs, there may exist a 
family of extractable one-way functions that remains secure for that particular 
auxiliary input distribution (although the running time of the extractable one- 
way function needs to be greater than the length of the auxiliary input). 

A First Idea: Using Turing Machine Obfuscators. At first sight, it would appear 
this problem could be solved if we could obfuscate Turing machines. Namely, by 
obfuscating a universal Turing machine in the place of a universal circuit in the 
construction above, the resulting program II s would depend only on the size of 
the PRF seed s, and not on the runtime of fj G F. 

But there is a catch. To rely on the punctured program paradigm, we must be 
able to obfuscate the program II s in such a way that the result is indistinguishable 

6 That is, a PRF where we can surgically remove one point in the domain of the 
PRF, keeping the rest of the PRF intact, and yet, even if we are given the seed of 
the punctured PRF, the value of the original PRF on the surgically removed point 
remains computationally indistinguishable from random. 
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from an obfuscation of a related “punctured” program 77? ; in particular, the 
size of the obfuscation must be at least as large as | IIf y \. Whereas the size of II s 
is now bounded by a polynomial in the size of the PRF seed 8, the description of 
this punctured program must specify a punctured input i (corresponding to an 
index of the candidate EOWF F) and hardcoded output value y , and hence must 
grow with the size of T. We thus run into a similar wall: even with obfuscation 
of Turing machines, the resulting auxiliary input distribution Z would only rule 
out EOWF with a-priori bounded index length. 

Out “Succinct Punctured Program” Technique. To deal with this issue, we 
develop a “succinct punctured program” technique. That is, we show how to 
make the size of the obfuscation be independent of the length of the input, while 
still retaining its usability as an obfuscator. The idea is two-fold: First, we modify 
the program II s to hash the input to the PRF, using a collision-resistant hash 
function h. That is, we now consider a program IJ h,s (j) = fj(PRF s (h(j))). 
Second, we make use of differing-inputs obfuscation , as opposed to just indis- 
tinguishability obfuscation. Specifically, our constructed auxiliary input distri- 
bution Z will sample a uniform s and a random hash function h (from some 
appropriate collection of collision-resistant hash functions) and then output a 
differing-inputs obfuscation of IJ h,s . 

To prove that this “universal” distribution Z over auxiliary input breaks all 
alleged extractable one-way functions over {0,1}^ — > {0, l} fe , we define a one- 
way function inverter Inv just as before, except that we now feed the EOWF 
extractor £ the obfuscation of the “punctured” variant Tl^y which contains a 

PRF seed punctured at point h(i). The program n^ y proceeds just as II h,s 
except on all inputs j such that h(j) is equal to this special value h(i); for those 
inputs it simply outputs the hardcoded value y. (Note that the index i is no 
longer needed to specify the function — rather, just its hash h(i) — but is 
included for notational convenience). As before, consider a hybrid experiment 
where y is selected as y := II h,s (i). 

Whereas before the punctured program was equivalent to the original, and 
thus indistinguishability of auxiliary inputs in the different experiments followed 
by the definition of indistinguishability obfuscation, here it is no longer the 
case that if y = II h,s (i ), then 11^ y is equivalent to IJ h,s — in fact, they may 
differ on many points. More precisely, the programs may differ in all points 
j such that h(j) = h(i), but j ^ i (since fj and fi may differ on the input 
PRF s (h(i))). Thus, we can no longer rely on indistinguishability obfuscation to 
provide indistinguishability of these two hybrids. 

We resolve this issue by relying differing-inputs obfuscation instead of just 
indistinguishability obfuscation. Intuitively, if obfuscations of Fl h ,s and FI ^ y can 
be distinguished when y is set to II h ’ s (i), then we can efficiently recover some 
input j where the two programs differ. But, by construction, this must be some 
point j for which h(j) = h(i) (or else the two program are the same), and j i 
(since we chose the hardcoded value y = II h,s (i) to be consistent with II h,s on 
input i. Thus, if the obfuscations can be distinguished, we can find a collision in 
7, contradicting its collision resistance. 
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To formalize this argument using just public-coin diO , we require that h is 
a public- coin collision-resistant hash function [29]. 


1.2 Removing Auxiliary Input in diO 

The notion of public-coin diO is weaker than “general” (not necessarily public- 
coin) diO in two aspects: ( 1 ) the programs M 0 , Mi are sampled using only public 
randomness, and ( 2 ) we consider only a very specific auxiliary input that is given 
to the attacker — namely the randomness of the sampling procedure. 

In this section, we explore another natural restriction of diO where we simply 
disallow auxiliary input, but allow for “private” sampling of Mo, M\. We show 
that “bad side information” cannot be circumvented simply by simply disallow- 
ing auxiliary input, but rather such information can appear in the input- output 
behavior of the programs to be obfuscated. 

More precisely, we show that for any distribution T> over V x V x {o, i Y of 
programs V and bounded-length auxiliary input, the existence of diO w.r.t. V is 
directly implied by the existence of any indistinguishability obfuscator (iO) that 
is diO-secure for a slightly enriched distribution of programs V' over V' xP', 
without auxiliary input. 

Intuitively, this transformation works by embedding the “bad auxiliary input” 
into the input-output behavior of the circuits to be obfuscated themselves. That 
is, the new distribution V is formed by sampling first a triple (Po, Pi, z) of pro- 
grams and auxiliary input from the original distribution X>, and then instead 
considering the tweaked programs P§ , Pf that have a special additional input 
x * (denoted later as “mode = *”) for which P§ {pc*) = Pf (x*) is defined to be 
z. This introduces no new differing inputs to the original program pair Pq,Pi, 
but now there is no hope of preventing the adversary from learning 2 without 
sacrificing correctness of the obfuscation scheme. 

A technical challenge arises in the security reduction, however, in which we 
must modify the obfuscation of the z-embedded program Pf to “look like” an 
obfuscation of the original program P 5 . Interestingly, this issue is solved by mak- 
ing use of a second layer of obfuscation, and is where the iO security of the 
obfuscator is required. We refer the reader to the full version of this work for 
details. 

1.3 Other Applications of the “Succinct Punctured Program” 
Technique 

As mentioned above, the “punctured program” paradigm of [34] has been used 
in multiple applications (e.g., [9,17,28,34]). Many of them rely on punctured 
programs in an essentially identical way to the approach described above, and 
in particular follow the same hybrids within the security proof. Furthermore, for 
some of these applications, there are significant gains in making the obfuscation 
succinct (i.e., independent of the input size of the obfuscated program). Thus, for 
these applications, if we instead rely on public-coin differing-inputs obfuscation 
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(and the existence of public-coin collision-resistant hash functions), by using our 
succinct punctured program technique, we can obtain significant improvements. 
For instance, relying on the same approach as above, we can show based on these 
assumptions: 

- “Succinct” Perfect Zero-Knowledge Non-Interactive Universal Argument Sys- 
tem (with communication complexity k e for every e), by relying on the non- 
succinct Perfect NIZK construction of [34]. 

- A universal instantiation of Random Oracles, for which the Full Domain Hash 
(FDH) signature paradigm [4] is (selectively) secure for every trapdoor (one- 
to-one) function (if hashing not only the message but also the index of the 
trapdoor function), by relying on the results of [28] showing how to provide a 
trapdoor- function specific instantiation of the random oracle in the FDH. 


1.4 Overview of Paper 

We focus in this extended abstract on the primary result: the conflict between 
public-coin differing inputs obfuscation and extractable OWFs (and SNARKs). 
Further preliminaries, applications of our succinct punctured programs tech- 
nique, and our transformation removing auxiliary input in differing-inputs obfus- 
cation are deferred to the full version [12]. 

2 Preliminaries 

2.1 Public-Coin Differing-inputs Obfuscation 

The notion of public-coin differing-inputs obfuscation (PC -diO) was introduced 
by Ishai et al. [30] as a refinement of (standard) differing-inputs obfuscation [2] 
to exclude certain cases whose feasibility has been called into question. (Note 
that we also consider “standard” differing-inputs obfuscation as described in 
Sect. 1.2. For a full treatment of the notion and our result, we refer the reader 
to the full version of this work [12]). 

We now present the PC -diO definition of [30], focusing only on Turing 
machine obfuscation; the definition easily extends also to circuits. 

Definition 1 (Public-Coin Differing-inputs Sampler for TMs). An effi- 
cient non-uniform sampling algorithm Samp = {Samp*.} is called a public-coin 
differing inputs sampler for the parameterized collection of TMs M = {Mk} if 
the output o/Samp fc is always a pair of Turing machines (Mo, Mi) E Mk x Mk 

7 That is, [28] shows that for every trapdoor one-to-one function, there exists some way 
to instantiate the random oracle so that the resulting scheme is secure. In contrast, 
our results shows that there exists a single instantiation that works no matter what 
the trapdoor function is. 
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such that | Mo | = \M\\ and for every efficient non-uniform algorithm A = {Ak} 
there exists a negligible function e such that for all k G N, 

Pr r <- {0, 1}*; (M 0 , Mi) <- Samp fc (r); (x, , l‘) <- A k {r) 

: (Mo(x) 7^ M(x)) A (steps(Mo,x) = steps(Mi, a;)) < e(fc). 

Definition 2 (Public-Coin Differing-Inputs Obfuscator for TMs). 

A uniform PPT algorithm O is a public-coin differing-inputs obfuscator for the 
collection M. = {Mk} if the following requirements hold: 

- Correctness: For every k G N, every M G A4k, and every x, we have that 
Pr [M <- 0(l k , M ) : M(x) = M(x)] = 1. 

- Security: For every public-coin differing-inputs sampler Samp = {Samp^,} 
for the ensemble M, every efficient non-uniform distinguishing algorithm V = 
{Vk}, there exists a negligible function e such that for all k, 

| Pr [r {0, 1}*; (M 0 , Mi) <- Samp fc (r); M <- 0(l k , M 0 ) :2> fc (r, M) = 1]- 

Pr[r {0, 1}*; (M 0 , Mi) <- Samp fc (r); M <- 0(1*, Mi) :2> fc (r, M) = 1] | < c(&). 

2.2 Extractable One-Way Functions 

We present a non-uniform version of the definition, in which both one-wayness 
and extractability are with respect to non-uniform polynomial-time adversaries. 

Definition 3 (Z- Auxiliary-Input EOWF). Let t, m be polynomially 
bounded length functions. An efficiently computable family of functions 

T = {/< : {0, l} k -> {0, \} m i e {0, l} m(fc) , k e n} , 

associated with an efficient probabilistic key sampler /Cj f, is a Z-auxiliary-input 
extractable one-way function if it satisfies: 

- One-wayness: For non-uniform poly-time A and sufficiently large k G N, 

Pr [z <- Z k ; i <— Kj r(l fc ); x <- {0, l} fe ; x' <- ^4(i, fi(x ); z) 

: fi(x') = fi(x)] < negl(fc). 

- Extractability: For any non-uniform polynomial-time adversary A, there 
exists a non-uniform polynomial-time extractor £ such that, for sufficiently 
large security parameter k G N: 

Pr [2 <- Z k \ i<— Kj ~(l fe ); y <- Al(i; 2 ); x' <- £(*; z ) 

: 3x s.A /i(x) =j/A /j(x') ± y] < negl(fc). 
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2.3 Succinct Non-Interactive Arguments of Knowledge (SNARKs) 

We focus attention to publicly verifiable succinct arguments. We consider succinct 
non-interactive arguments of knowledge (SNARKs) with adaptive soundness in 
Sect. 3.2, and consider the case of specific distributional auxiliary input. 

Definition 4 (^-Auxiliary Input Adaptive SNARK). A triple of algo- 
rithms (CRSGen, Prove, Verify) is a publicly verifiable, adaptively sound succinct 
non-interactive argument of knowledge (SNARK) for the relation IZ if the fol- 
lowing conditions are satisfied for security parameter k: 

- Completeness: For any (x, w) G 7 Z, 

Pr[crs CRSGen(l /c ); 7 r Prove(x, re, crs) : Verify(x, 7r, crs) = 1] = 1. 

In addition, Prove(x, w, crs) runs in time poly (k, \y\,t). 

- Succinctness: The length of the proof tt output by Prove(x, w, crs), as well 
as the running time of Verify (x, 7r, crs), is bounded by p{k + \X\), where p is 
a universal polynomial that does not depend on IZ. In addition, CRSGen (1^) 
runs in time poly (k): in particular , crs is of length poly (k). 

- Adaptive Proof of Knowledge: For any non-uniform polynomial- size 
prover P* there exists a non-uniform polynomial- size extractor £p* , such that 
for all sufficiently large k G N and auxiliary input z <— Z, it holds that 

Fi[z <— Z\ crs <— CRSGen(l /c ); (x, tt) <— P*(z, crs); 

(x, w) <— £p* (z, crs) : Verify(crs, x, tt) = 1 A w ^R(x)] < negl(fc). 

In the full version of this work, we obtain as an application of our succinct 
programs technique zero-knowledge (ZK) succinct non-interactive arguments 
(SNARGs), without the extraction property. We refer the reader to [12] for 
a full treatment. 


2.4 Puncturable PRFs 

Our result makes use of puncturable PRFs, which are PRFs with an extra capa- 
bility to generate keys that allow one to evaluate the function on all bit strings 
of a certain length, except for any polynomial-size set of inputs. We focus on the 
simple case of puncturing PRFs at a single point: that is, given a punctured key 
k* with respect to input x, one can efficiently evaluate the PRF at all points 
except x , whose evaluation remains pseudorandom. We refer the reader to [34] 
for a formal definition. 

As observed in [8,11,31], the GGM tree-based PRF construction [22] yields 
puncturable PRFs, based on any one-way function. 

Theorem 4 ([8,11,31]). If one-way functions exist, then for all efficiently com- 
putable m'(k) and £(k), there exists a puncturable PRF family that maps m'(k) 
bits to i(k) bits, such that the size of a punctured key is 0(m'(k ) • i{k)). 
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3 Public-Coin Differing-Inputs Obfuscation 
or Extractable One-Way Functions 

In this section, we present our main result: a conflict between extractable one- 
way functions (EOWF) w.r.t. a particular distribution of auxiliary information 
and public-coin differing-inputs obfuscation (“PC — diO”) (for Turing Machines). 


3.1 From PC -diO to Impossibility of ^-Auxiliary-Input EOWF 

We demonstrate a bounded polynomial-time uniformly samplable distribution Z 
(with bounded poly-size output length) and a public-coin differing-inputs sam- 
pler for Turing Machines V (over TM x TM) such that if there exists public-coin 
differing-inputs obfuscation for Turing machines (and, in particular, for the pro- 
gram sampler V), and there exist public-coin collision-resistant hash functions 
(CRHF), then there do not exist extractable one-way functions (EOWF) w.r.t. 
auxiliary information sampled from distribution Z. In our construction, Z con- 
sists of an obfuscated Turing machine. 

We emphasize that we provide a single distribution Z of auxiliary inputs for 
which all candidate EOWF families T with given output length will fail. This 
is in contrast to the result of [7], which show for each candidate family T that 
there exists a tailored distribution Z? (whose size grows with \Z\) for which T 
will fail. 

Theorem 5. For every polynomial i, there exists an efficient, uniformly sam- 
plable distribution Z such that, assuming the existence of public-coin collision- 
resistant hash functions and public-coin differing-inputs obfuscation for Turing 
machines, then there cannot exist Z -auxiliary -input extractable one-way func- 
tions {fi : {0, l} k — ► {0, 1}^)}. 

Proof We construct an adversary A and desired distribution Z on auxiliary 
inputs, such that for any alleged EOWF family T , there cannot exist an efficient 
extractor corresponding to A given auxiliary input from Z (assuming public-coin 
CRHFs and PC - diO). 

The Universal Adversary A . We consider a universal PPT adversary A that, 
given (i,z) G {0, 1}p°'yW x {0,1}”^, parses z as a Turing machine and returns 
z(i). Note that in our setting, i corresponds to the index of the selected function 
fi £ T , and (looking ahead) the auxiliary input z will contain an obfuscated 
program. 

The Auxiliary Input Distribution Z . Let V1ZF = {PRF S : {0, l} m (0 — ► 
{0, l} /c } sG {o 5 i}fc be a puncturable pseudorandom function family, and H = {Hk} 
a public-coin collision-resistant hash function family with : {0, 1}* — > {0, l} m ^) 
for each h £ Tik- (Note that by Theorem 4, punctured PRFs for these parameters 
exist based on OWFs, which are implied by CRHF). We begin by defining two 
classes of Turing machines : 
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M = {n h ' s I s e {o, i} fe , heH k , k e n} , 

M* =[n^ | s e {0, l} fc , y e {0, heH k , ken}, 


which we now describe. We assume without loss of generality for each k that the 
corresponding collection of Turing machines II h,s E M.k, n^y C M.% are of the 
same size ; this can be achieved by padding. (We address the size bound of each 
class of machines below). In a similar fashion, we may further assume that for 
each k the runtime of each IJ h,s and IJ^y on any given input fi is equal. 

At a high level, each machine IJ h,s accepts as input a poly-size circuit descrip- 
tion of a function fi (with canonical description, including a function index i), 
computes the hash of the corresponding index i w.r.t. the hardcoded hash func- 
tion /i, applies a PRF with hardcoded seed s to the hash, and then evaluates 
the circuit fi on the resulting PRF output value x : that is, II ^y (fi) outputs 
Uk{fi, PRF s (fo(i))), where Uk is the universal Turing machine. See Fig. 1. Note 
that each II h,s can be described by a Turing machine of size 0(|s| + |/i| + |E/fc|), 
which is bounded by p(k ) for some fixed polynomial p. 


Turing Machine II h,s : 

Hardwired: Hash function h : {0, 1}* — > {0, PRF seed s E {0, l } k . 

Inputs: Circuit description fi 

1. Hash the index: v = h{i). 

2. Compute the PRF on this hash: x = PRF s (u). 

3. Output the evaluation of the universal Turing machine on inputs fi,x: i.e., 
y = U k (fi,x). 


Fig. 1. Turing machines II h,s E M. 


Auxiliary Input Distribution Zk'. 

1. Sample a hash function h Uk and PRF seed s JCvnx(^ k )- 

2. Output an obfuscation 77 PC-diO(II h,s ). 


Fig. 2. The auxiliary input distribution Zk. 


The machines 77^ s perform a similar task, except that instead of having the 
entire PRF seed 5 hardcoded, they instead only have a punctured seed 5* derived 
from s by puncturing it at the point h(i) (i.e., enabling evaluation of the PRF 
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on all points except ft(i)). In addition, it has hardwired an output y to replace 
the punctured result. More specifically, on input a circuit description fj (with 
explicitly specified index j), the program IJ^y first computes the hash ft = ft(j), 
continues computation as usual for any ft h{i ) using the punctured PRF key, 
and for ft = ft(i), it skips the PRF and Uk evaluation steps and directly outputs 
y. Note that because ft is not injective, this puncturing may change the value 
of the program on multiple inputs fj (corresponding to functions fj G T with 
h(j ) = h{i)). When the hardcoded value y is set to y — /i(PRF s (ft(i))), then 
Il^y agrees with IJ h,s additionally on the input fa, but not necessarily on the 
other inputs fj for which h(j ) = h(i). (Indeed, whereas the hash of their indices 
collide, and thus their corresponding PRF outputs, PRF(ft(j)), will agree, the 
final step will apply different functions fj to this value) . 

We first remark that indistinguishability obfuscation arguments will thus not 
apply to this scenario, since we are modifying the computed functionality. In 
contrast, differing-inputs obfuscation would guarantee that the two obfuscated 
programs are indistinguishable, since otherwise we could efficiently find one of 
the disagreeing inputs, which would correspond to a collision in the CRHF. But, 
most importantly, this argument holds even if the randomness used to sample the 
program pair (77^,77^) is revealed. Namely, we consider a program sampler 

that generates pairs (II h,s , H^ff ) of the corresponding distribution; this amounts 
to sampling a hash function ft, an EOWF challenge index i, and a PRF seed 
8, and a ft(i)-puncturing of the seed, 8*. All remaining values specifying the 
programs, such as y = fa(PRF s (h(i))), are deterministically computed given 
(ft, i, 8, 8*). Now, since Ti is a public-coin CRHF family, revealing the randomness 
used to sample ft <— Ti is not detrimental to its collision resistance. And, the 
values i, 8, and 8* are completely independent of the CRHF security (i.e., a CRHF 
adversary reduction could simply generate them on its own in order to break ft) . 
Therefore, we ultimately need only rely on public-coin diO. 

We finally consider the size of the program(s) to be obfuscated. Note that each 
n^y can be described by a Turing machine of size 0(|s*| + |ft| + | 2 /| + |£/fc|). Recall 
by Theorem 4 the size of the punctured PRF key \s*\ G 0(m f (k)l(k)), where the 
PRF has input and output lengths m!(k) and t(k). In our application, note that 
the input to the PRF is not the function index i itself (in which case the machine 
Ilffy would need to grow with the size of the alleged EOWF family), but rather 
the hashed index h{i), which is of fixed polynomial length. Thus, collectively, we 
have | Il^y | is bounded by a fixed polynomial p'(fc), and finally that there exists a 

single fixed polynomial bound on the size of all programs n h,s G M, Il^y G M* . 
This completely determines the auxiliary input distribution Z = {i^}, described 
in full in Fig. 2. (Note that the size of the auxiliary output generated by Z, which 
corresponds to an obfuscation of an appropriately padded program II h,s is thus 
also bounded by a fixed polynomial in k). 

A Has No Extractor. We show that, based on the assumed security of the 
underlying tools, the constructed adversary A given auxiliary input from the 
constructed distribution Z = {Z/ c }, cannot have an extractor £ satisfying 
Definition 3: 
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Turing Machine Ft^f : 

Hardwired: Hash function /z : {0, 1}* — ^ {0, l} m ^, punctured PRF seed s* E {0, l} fe , 
punctured point h(i), bit string y E {0, l} e ^ k \ 

Input: Circuit description fj (containing index j ) 

1. Hash the index: v = h(j). 

2. If v ^ h(i), compute x = PRF s *(u), and output Uk(fj,x). 

3. If v = h(i ), output y. 


Fig. 3. “Punctured” Turing machines Ft^f E A4*. 


Auxiliary Input Distribution Zk(i,y): 

1. Sample a hash function h <— 7~Lk and PRF seed s <— Kvnxi} 1 *)- 

2. Sample a punctured PRF seed s* <— Punct(s, h(i)), punctured at point h(i). 

3. Compute the “correct” punctured evaluation: y = fi(PRF s (h(i))). 

4. Output an obfuscation M PC-diO{nff), where Ftff is defined from (h,s*,y), 
as in Figure 3. 


Fig. 4. The “punctured” distribution Zk(i,y). 


Proposition 1. For any non-uniform polynomial-time candidate extractor £ 
for A , it holds that £ fails with overwhelming probability : i.e., 


Pr 


z^Z k - r(l fe ); y<-A(i;z); x'<-£(i;z) 


: 3x s.t. fi(x ) = y A fi(x') Ay > 1 - negl(fc). 


Proof. First note that given auxiliary input 2: <— Z^, A produces an element in 
the image of the selected fi with high probability. That is, 

Pr [z <- Z k ;i <— JC? r(l fe ); y <— A(i; z ) : 3x s.t. f t (x ) = y] > 1 - negl(fc). 

Indeed, by the definition of A and Z^, and the correctness of the obfuscator 
PC — diO , then we have with overwhelming probability 

A(i-,z) = M(fi) = n h ’ s (fi) = /i(PRF s (h(i))), 

where 2 = M is an obfuscation of II h,s E Ai; i.e., z = M <— PC — diO(II h,s ). 

Now, suppose for contradiction that there exists a non- negligible function 
e(k) such that for all k E N the extractor £ successfully outputs a preimage 
corresponding to the output A(i; z) E Range(fi) with probability e(k): i.e., 


Pr 


Z k \ i <— K.yr(l k )-, x' <— £(i;z) 


: fi(x') = A(i; z) = /i(PRF g (/i(i))) > e(k). 
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where as before, s, ft are such that z = PC — diO(IJ h,s ). We show that this 
cannot be the case, via three steps. 

Step 1: Replace Z with “punctured” distribution Z(i,y). For every index i of the 
EOWF family T and k E N, consider an alternative distribution Zk(i,y) that, 
instead of sampling and obfuscating a Turing machine IJ h,s from the class A4, 
as is done for Z, it does so with a Turing machine H^y E M* as follows. First, 
it samples a hash function h <— Hk and PRF seed s as usual. It then generates 
a punctured PRF key s* <— Punct(s, h{i)) that enables evaluation of the PRF on 
all points except the value h(i). For the specific index i, it computes the correct 
full evaluation y := /^(PRF s (h(i))). Finally, Zk(i,y) outputs an obfuscation of 
the constructed program II ^ as specified in Fig. 3 from the values (ft, s*,?/): 

i.e., M PC — diO(II^y). See Fig. 4 for a full description of Z(i,y). 

We now argue that the extractor £ must also succeed in extracting a preimage 
when given a value z* <— Zk(i,y) from this modified distribution instead of Zk- 
Consider the Turing Machine program sampler algorithm Samp as in Fig. 5. 


Program Pair Sampler Samp(l fc ,r): 

1. Sample a hash function ft — 7 ~Ck(rh)- 

2. Sample an EOWF index i = K T ( l k \n). 

3. Sample a PRF seed s = Kprf( 1 /c ; r s ). 

4. Sample a punctured PRF seed s * = Punct(s, ft(z); r*). 

5. Let y = /;(PRF s (ft(z))). 

6. Denote r := (r^, n, r s , r*). 

7. Output program pair 77^ s ), defined by ft, z, s,s* ,y as above (and padded to 

equal length). 


Fig. 5. Program pair sampler algorithm, to be used in public-coin differing inputs 
security step. 


We first argue that, by the (public-coin) collision resistance of the hash family 
H, the sampler algorithm Samp is a public-coin differing-inputs sampler , as per 
Definition 1. 

Claim. Samp is a public-coin differing-inputs sampler. That is, for all efficient 
non-uniform Mpc, there exists a negligible function e such that for all ft G N, 

Pr [r <- {0, 1}*; (M 0 , Mi) <- SampfV, r); (x, 1 *) <- ^lpc(l fc , r) : 

Mq(x) ^ Mi(x) A steps(Mo, x) m steps(Mi, x) ss t] <e(k). (1) 

Proof. Suppose, to the contrary, there exists an efficient (non-uniform) adver- 
sary Apc and non- negligible function a (ft) for which the probability in Eq. 1 is 
greater than a(k). We show such an adversary contradicts the security of the 
(public-coin) CRHF. Consider an adversary Aqr in the CRHF security challenge. 
Namely, for a challenge hash function ft <— Hk{rh ), the adversary Acr receives 
ft,r/i, and performs the following steps: 


Limits of Extractability Assumptions with Distributional Auxiliary Input 253 


CRHF adversary AIcr( 1 /c , h, 77 J: 

1. Imitate the remaining steps of Samp. That is, sample an EOWF index 

i = Kjr{\ k \ri) \ a PRF seed s = Kppf(l k ; r s ); and a punctured PRF seed 
5 * = Punct (s,ft(z);r*). Define y = fi(PRF 8 (h(i))) and r = (r/*, r*, r s , r*), 
and let M 0 = II h ' a and M x = . 

2. Run Alpc(l /c , r) on the collection of randomness r used above. In response, 
Alpc returns a pair (x, P). 

3. AIcr outputs the pair (i,x) as an alleged collision in the challenge hash 
function h. 

Now, by assumption, the value x generated by Alpc satisfies (in particular) that 
Mq(x) 7 ^ Mi(x). From the definition of M 0 , Mi (i.e., n h ’ s ,n^y), this must mean 
that h(i) = h(x) (since all values with h(x) ^ h(i) were not changed from II h,s 
to Il^y ) , and that i ^ x (since II ^y (i) was specifically “patched” to the correct 
output value H h,s (i)). That is, AIcr successfully identifies a collision with the 
same probability a(k), which must thus be negligible. 

We now show that this implies, by the security of the public-coin diO, that 
our original EOWF extractor £ must succeed with nearly equivalent probability 
in the EOWF challenge when instead of receiving (real) auxiliary input from 
Zk, both £ and A are given auxiliary input from the fake distribution Zk{i,y). 
(Recall that e is assumed to be £’s success in the same experiment as below but 
with z < — Zk instead of Zk(i,y)). 

Lemma 1. It holds that 


Pr 


i «- Z k {i,y)\ x'^£(i;z*) 


fi(x') = A(i-,Z*) = /i(PRF s (h(i))) > e(k) - negl(fc). (2) 


Proof. Note that given 2 :* <— Zk(i,y) (which corresponds to an obfuscated 
program of the form 77^ s ) our EOWF adversary A indeed will still output 
= y-= /i(PRF s {h(i))) (see Figs. 3,4). 

Now, suppose there exists a non- negligible function a(k) for which the prob- 
ability in Eq. (2) is less than e(k) — a(k). We directly use such £ to design 
another adversary Adio to contradict the security of the public-coin diO with 
respect to the program pair sampler Samp (which we showed in Claim 3.1 to be 
a void public-coin differing inputs sampler). Recall the diO challenge samples a 
program pair (JJ h,s , JJ^y) <— Samp(l /c ,r), selects a random M <— {Il h,s , Il^y} 
to obfuscate as M <— PC — diO(l k , M), and gives as a challenge the pair (r, M) 
of the randomness used by Samp and obfuscated program. Define Adio (who 
wishes to distinguish which program was selected) as follows. 
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PC -diO adversary AdioO- k ,r,M): 

1. Parse the given randomness r used in Samp as r = (r^ , r*, r s , r* ) (see 
Fig. 5). 

2. Recompute the “challenge index” i = Kjr{\ k ^n). Let z* = M. 

3. Run the extractor algorithm £ (i; z*), and receive an alleged preimage x'. 

4. Recompute h = Hk( r h ), 5 = iCpRp(l ; r s ), again using the randomness 
from r. 

5. If/i (a/) = /i(PRF s (h(i))) — i.e., if £ succeeded in extracting a preimage 
— then Adio outputs 1. Otherwise, Adio outputs 0. 

Now, if M is an obfuscation of II h,s , then this experiment corresponds directly 
to the EOWF challenge where £ (and A) is given auxiliary input 2 : <— Z k . 
On the other hand, if M is an obfuscation of 77^ s , then the experiment corre- 
sponds directly to the same challenge where £ (and A) is given auxiliary input 
2 :* Z k (i,y). Thus, Adio will succeed in distinguishing these two cases with 
probability at least [e(fc)] — [e(fc) — a(k)\ = a(k). By the security of PC — diO , it 
hence follows that ct(k) must be negligible. 

Step 2: Replace “correct” hardcoded y inZ(i,y) with random fi evaluation. Next, 
we consider another experiment where Z k (i, y) is altered to a nearly identical dis- 
tribution Zk(i,u) where, instead of hardcoding the “correct” i-evaluation value 
y = fi(PRF s (h(i))) in the generated “punctured” program the distribution 
Zk{i,u) now simply samples a random fi output y = fi(u) for an independent 
random ^ ^ — {0, l} fc . We claim that the original EOWF extractor £ still succeeds 
in finding a preimage when given this new auxiliary input distribution: 

Lemma 2. It holds that 


Pr 


1 ); z** Z k (i,u)] x' <-£(*; O : 


fi(x') = A{i ; 2 :**) = fi(u) > e(k) - negl(fc). (3) 


Proof. This follows from the fact that PRF s (h(i)) is pseudorandom, even given 
the h(i)-punctured key s*. 

Formally, consider an algorithm *4p RF which, on input the security parameter 
l fe , a pair of values i, h, and a pair s*,x (that will eventually correspond to a 
challenge punctured PRF key, and either PRF s (h(i)) or random u ), performs the 
following steps. 

Algorithm Ap Rf (l k , i, h, s*, x): 

1. Take y = fi{x), and obfuscate the associated program n k y : i.e., 2 :** <— 

pc-diO{i k ,n k y s ). 

2. Run the EOWF extractor given index i and auxiliary input 2 :**: x' <— 
£{i\ z**). 
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3. Output 0 if £ succeeds in extracting a valid preimage: i.e. , if fi(x') = 
y* = Otherwise, output a random bit 5 {0, 1}. 

Now, suppose Lemma 2 does not hold: i.e., the probability in Eq. (3) differs 
by some non- negligible amount from e(k). Then, expanding out the sampling 
procedure of Zk{i,y) and Zk(i,u), we have for some non-negligible function 
a(k) that 


Pr 


i h^H k ; s <- /C P -R.^(l fc ); s* <— Punct(s, h(i)); 


u <- {0, l} k -,b <- {0, 1} : A° PRF (l k ,i,h,x b ) = b >-+a(k), (4) 


where xq := PRF s (h(i)) and x\ := u. Indeed, in the case 5 = 0, the auxiliary 
input z** generated by Aprf and given to £ has distribution exactly Z(i,y), 
whereas in the case 5 = 1, the generated z** has distribution exactly Z(i,u). 

In particular, there exists a polynomial p(k) such that for infinitely many k, 
there exists an index and hash function hk E Hk with 


Pr 


5 <- JC V izf( l fc ); 5* Punct(5, h(ik)); u <- {0, l} fc ; 

5 {0, 1} : ApRf(l k , ik , h, Xb) = 5 


1 1 

- 2 + p(k)’ 


(5) 


where xq,xi are as before. 

Consider a non-uniform punctured-PRF adversary *4p RF (with the ensemble 
I = {ik,hk} hardcoded) that first selects the challenge point hk(ik ); receives 
the PRF challenge information (s*,x) for this point; executes *4p RF on input 
(1 hk,s*,x), and outputs the corresponding bit 5 output by Al RRF . Then by 

(5), it follows that Alp RF breaks the security of the punctured PRF. 


Step 3: Such an extractor breaks one-wayness of EOWF. Finally, we observe that 
this means that £ can be used to break the one-wayness of the original function 
family T . Indeed, given a random key i and a challenge output y = fi(u ), an 
inverter can simply sample a hash function h and 5,(i)-punctured PRF seed s* 
on its own, construct the program with its challenge y hardcoded in, and 

sample an obfuscation z** PC — diOiJI^y). Finally, it runs £(i,z**) to invert 
2/*, with the same probability e(k) — negl(fc). 

This concludes the proof of Theorem 5. 


3.2 PC -diO or SNARKs 

We link the existence of public-coin differing- inputs obfuscation for NC 1 and 
the existence of succinct non-interactive arguments of knowledge (SNARKs), 
via an intermediate step of proximity extractable one-way functions (PEOWFs), 
a notion related to EOWFs, introduced in [5]. Namely, assume the existence of 
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fully homomorphic encryption (FHE) with decryption in NC 1 and public-coin 
collision-resistant hash functions. Then, building upon the results of the previous 
subsection, and the results of [5,30], we show: 

1. Assuming SNARKs for NP, there exists an efficient distribution Z such that 
public-coin differing- inputs obfuscation for NC 1 implies that there cannot 
exist PEOWFs {/ : {0, l} k -► {0, l} k } w.r.t. Z. 

2. PEOWFs {/ : {0,1}^ — > {0, l}^} w.r.t. this auxiliary input distribution Z 
are implied by the existence of SNARKs for NP secure w.r.t. a second efficient 
auxiliary input distribution Z' , as shown in [5]. 

3. Thus, one of these conflicting hypotheses must be false. That is, there exists an 
efficient distribution Z' such that assuming existence of FHE with decryption 
in NC 1 and collision-resistant hash functions, then either: (1) public-coin 
differing-inputs obfuscation for NC 1 does not exist, or (2) SNARKS for NP 
w.r.t. Z' do not exist. 

Note that we focus on the specific case of PEOWFs with fc-bit inputs and 
fc-bit outputs, as this suffices to derive the desired contradiction; however, the 
theorems following extend also to the more general case of PEOWF output length 
(demonstrating an efficient distribution Z to rule out each potential output 
length £(k)). 


Proximity EOWFs. We begin by defining Proximity EOWFs. 


Proximity Extractable One-Way Functions (PEOWFs). In a Proximity EOWF 
(PEOWF), the extractable function family {/^} is associated with a “proximity” 
equivalence relation ~ on the range of /$, and the one-wayness and extractabil- 
ity properties are modified with respect to this relation. The one-wayness is 
strengthened: not only must it be hard to find an exact preimage of v, but it is 
also hard to find a preimage of any equivalent v ~ v' . The extractability require- 
ment is weakened accordingly: the extractor does not have to output an exact 
preimage of v , but only a preimage of of some equivalent value v' ~ v . 

As an example, consider functions of the form / : x i— > (fi(x), f 2 (x)) and 
equivalence relation on range elements (a, b) ~ (a, b') whose first components 
agree. Then the proximity extraction property requires for any adversary A who 
outputs an image element (a, b) G Range(f) that there exists an extractor £ 
finding an input x s.t. f[x) — (a, b') for some V not necessarily equal to b. 

In this work, we allow the relation ~ to depend on the function index z, 
but require that the relation ~ is publicly (and efficiently) testable. We further 
consider non-uniform adversaries and extraction algorithms, and (in line with 
this work) auxiliary inputs coming from a specified distribution Z. 

Definition 5 (Z- Auxiliary-Input Proximity EOWFs). Let £,m be poly- 
nomial^ bounded length functions. An efficiently computable family of functions 


r = (/i : {o, l} fe 


{0, l}^ fc) 


i e {0, fc e n| , 
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associated with an efficient probabilistic key sampler JCjr, is a Z-auxiliary-input 
proximity extractable one-way function if it satisfies the following (strong) one- 
wayness, (weak) extraction, and public testability properties: 

- (Strengthened) One-wayness: For non-uniform polynomial-time A and 
sufficiently large security parameter k G N, 


Pr 


z 


Z k ; i <— K.jr(l k ); x 


(0, l} fc ; x' <- A(i, fi(x); z) 


: ~ fi(x) 


< negl(fc). 


- (Weakened) Extractability: For any non-uniform polynomial-time adver- 
sary A, there exists a non-uniform polynomial-time extractor £ such that, for 
sufficiently large security parameter k G N ; 


Pr 


z 


%>k\ 


K-r{ l fc ); V<~A(i;z); x'<-£(i;z) 


: 3x s.t. fi(x) 


y^fi (%') / y 


< negl(fc). 


- Publicly Testable Relation: There exists a deterministic polytime machine 
T such that, given the function index i, T accepts y,y' G {0, l} 1 ^ if and only 

y'- 

(PC - diO for NC 1 + PC-CRHF + FHE + SNARK ) ^ No 
£-PEOWF. We now show that, assuming the existence of public-coin collision- 
resistant hash functions (CRHF) and fully homomorphic encryption (FHE) with 
decryption in NC 1 , 8 then for some efficiently computable distributions ^snark, 
^peowf? if there exist public-coin differing- inputs obfuscators for NC 1 circuits, 
and SNARKs w.r.t. auxiliary input ^snark, then there cannot exist PEOWFs 
w.r.t. auxiliary input ^peowf- This takes place in two steps. 

First, we remark that an identical proof to that of Theorem 5 rules out 
the existence of Z-auxiliary-input proximity EOWFs in addition to standard 
EOWFs, based on the same assumptions: namely, assuming public-coin differing- 
inputs obfuscation for Turing machines, and public-coin collision-resistant hash 
functions. Indeed, assuming the existence of a PEOWF extractor £ for the adver- 
sary A and auxiliary input distribution Z (who extracts a “related” preimage to 
the target value), the same procedure yields a PEOWF inverter who similarly 
extracts a “related” preimage to any challenge output. In the reduction, it is 
merely required that the success of £ is efficiently and publicly testable (this is 
used to construct a distinguishing adversary for the differing-inputs obfuscation 
scheme, in Step 1). However, this is directly implied by the public testability of 
the PEOWF relation as specified in Definition 5. 

As is the case for nearly all existing FHE constructions (e.g., [13,21]). 


8 
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Theorem 6. There exist an efficient , uniformly samplable distribution Z such 
that, assuming the existence of public- coin collision-resistant hash functions and 
public- coin differing -inputs obfuscation for polynomial- size Turing machines, 
there cannot exist (publicly testable) Z- auxiliary -input PEOWFs {fi : {0, l} k — > 
{ 0 , 1 }*}. 

Now, in [30], it was shown that public-coin differing- inputs obfuscation for the 
class of all polynomial-time Turing machines can be achieved by bootstrapping 
up from public-coin differing- inputs obfuscation for circuits in the class NC 1 , 
assuming the existence of FHE with decryption in NC 1 , public-coin CRHF, and 
public-coin SNARKs for NP. 

Putting this together with Theorem 6, we thus have the following corollary. 

Corollary 1. There exists an efficient, uniformly samplable distribution Z s.t., 
assuming existence of public- coin SNARKs and FHE with decryption in NC 1 , 
then assuming the existence of public- coin differing-inputs obfuscation for NC 1 , 
there cannot exist PEOWFs {fi : {0, l} k — > {0, 1}^} w.r.t. auxiliary input Z. 

( SNARK + CRHF) => Z-PEOWF. As shown in [5], Proximity EOWFs 
(PEOWFs) with respect to an auxiliary input distribution Z are implied by 
collision-resistant hash functions (CRHF) and SNARKs secure with respect to 
a related auxiliary input distribution Z ' . 9 

Loosely, the transformation converts any CRHF family T into a PEOWF by 
appending to the output of each / E T a succinct SNARK argument tt x that 
there exists a preimage x yielding output f(x). (If the Prover algorithm of the 
SNARK system is randomized, then the function is also modified to take an 
additional input, which is used as the random coins for the SNARK generation). 
The equivalence relation on outputs is defined by {y, tt) G/X) if y~ y' (note 
that this relation is publicly testable). More explicitly, consider the new function 
family T' composed of functions 

fcrs( x > r ) = (/( x )> Prove(l fe , crs, /(x), x; r)) , 

where a function ff s E T' is sampled by first sampling a function / <— T from 
the original CRHF family, and then sampling a CRS for the SNARK scheme, 
crs CRSGen(l /e ). 

Now (as proved in [5]), the resulting function family will be a PEOWF with 
respect to auxiliary input Z if the underlying SNARK system is secure with 
respect to an augmented auxiliary input distribution JSsnark := 0^, ft), formed 
by concatenating a sample from Z with a function index h sampled from the 
collision-resistant hash function family T . (Note that we will be considering 
public-coin CRHF, in which case h is uniform). 

Theorem 7 ([5]). There exist efficient, uniformly samplable distributions 
Z , ^snark such that, assuming the existence of collision-resistant hash functions 
and SNARKs for NP secure w.r.t. auxiliary input distribution ^snarK; then there 
exist PEOWFs {f : {0, l} k {0, 1}^} w.r.t. Z. 


9 [5] consider the setting of arbitrary auxiliary input; however, their construction 
directly implies similar results for specific auxiliary input distributions. 
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Reaching a Standoff. Observe that the conclusions of Corollary 1 and 
Theorem 7 are in direct contradiction. Thus, it must be that one of the two 
sets of assumptions is false. Namely, 

Corollary 2. Assuming the existence of public-coin collision-resistant hash func- 
tions and fully homomorphic encryption with decryption in NC 1 , there exists an 
efficiently samplable distribution ^snark such that one of the following two objects 
cannot exist: 

- SNARKs w.r.t. auxiliary input distribution ^snark- 

- Public-coin differing -inputs obfuscation for NC 1 . 

More explicitly, we have that >2snark = (Z,U), where Z is composed of an 
obfuscated program, and U is a uniform string (corresponding to a randomly 
sampled index from a public-coin CRHF family). 


Acknowledgements. The authors would like to thank Kai-Min Chung for several 
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Abstract. It takes time for theoretical advances to get used in practical 
schemes. Anonymous credential schemes are no exception. For instance, 
existing schemes suited for real-world use lack formal, composable defi- 
nitions, partly because they do not support straight-line extraction and 
rely on random oracles for their security arguments. To address this 
gap, we propose unlinkable redactable signatures (URS), a new building 
block for privacy- enhancing protocols, which we use to construct the first 
efficient UC-secure anonymous credential system that supports multiple 
issuers, selective disclosure of attributes, and pseudonyms. Our scheme 
is one of the first such systems for which both the size of a credential 
and its presentation proof are independent of the number of attributes 
issued in a credential. Moreover, our new credential scheme does not rely 
on random oracles. As an important intermediary step, we address the 
problem of building a functionality for a complex credential system that 
can cover many different features. Namely, we design a core building 
block for a single issuer that supports credential issuance and presen- 
tation with respect to pseudonyms and then show how to construct a 
full-fledged credential system with multiple issuers in a modular way. 
We expect this definitional approach to be of independent interest. 


Keywords: Structure preserving signatures • Vector commitments • 
Anonymous credentials • Universal composability • Groth-Sahai proofs 


1 Introduction 

Digital signature schemes are a fundamental cryptographic primitive. Besides 
their use for signing digital items, they are used as building blocks in more 
complex cryptographic schemes such as blind signatures [6,42], group signa- 
tures [15,52], direct anonymous attestation [20], electronic cash [40], voting 
schemes [48], adaptive oblivious transfer [23,32], and anonymous credentials [12]. 
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For protocols constructed like this to be efficient, special properties are 
demanded from a signature scheme, in particular when the protocol needs to 
achieve strong privacy properties. The most important such properties seem to 
be that the issuance of a signature and its later use in a protocol is unlinkable as 
well as that the scheme is able to sign multiple messages (without employing a 
hash function). The first signature scheme that met these requirements is a blind 
signature scheme by Brands [18]. The drawback of blind signatures, however, is 
that when using the signature later in a higher-level protocol it must typically be 
revealed so that a third party can be convinced of its validity. Thus, a signature 
can be used only once, which turns out to be quite limiting for applications such 
as group signatures, multi-show anonymous credentials, and compact e-cash [25]. 

Camenisch and Lysyanskaya [30,31] were the first to design signature schemes 
(CL-signatures) overcoming these drawbacks. Their schemes are secure under the 
Strong RSA, the g-SDH, or the LRSW assumption, respectively, and allow for an 
alternative approach when using a signature in a protocol: instead of revealing 
the signature to a party, the user employs zero-knowledge proofs to convince the 
party that she possesses a valid signature. While in theory this is possible for any 
signature scheme, CL-signatures were the first that enabled efficient proofs using 
generalized Schnorr proofs of knowledge. This is due to the algebraic properties of 
CL-signatures, i.e., no hash function is applied to the message and the signature 
and message values are either exponents or group elements. 

Since the introduction of CL-signatures, the area of privacy preserving pro- 
tocols flourished considerably and numerous new protocols based on them have 
been proposed. This has also made it apparent, however, that CL-signatures still 
have a number of drawbacks: 

1. Random oracles. To make generalized Schnorr proofs of knowledge non- 
interactive (which is often required), one needs to resort to the Fiat- Shamir 
heuristic, i.e., to the random oracle model, and thus looses all provable secu- 
rity guarantees when the oracle is instantiated by a hash function [36]. 

2. Straight-line extraction. When designing a protocol to be secure in the UC 
model [35], rewinding can not be used to prove security. As a result, witnesses 
in generalized Schnorr proofs of knowledge need to be encrypted under a 
public key encoded in the common reference string (CRS). As the witnesses 
(messages signed with CL-signatures) are discrete logarithms, this is rather 
expensive [26] and may render the overall protocol impractical. 

3. Linear size. When proving ownership of a CL-signature on many messages, 
all of them are needed for the verification of the signature and therefore a 
proof of possession of a signature will be linear in the number of messages. 

A promising ingredient to overcome these drawbacks is the work by Groth 
and Sahai [45], who for the first time constructed efficient non-interactive zero- 
knowledge proofs (NIZK) without using random oracles, albeit for a limited set 
of languages. Indeed, the set of languages covered by these so-called GS-proofs 
does not include the ones covered by generalized Schnorr protocols and therefore 
many authors started to look for a compatible CL-signatures replacement, i.e., 
structure-preserving signature schemes [1-3]. Together with GS-proofs, these 
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new schemes can also be used as signatures of knowledge [39] and thus are 
applicable in scenarios previously addressed with CL-signatures. 

However, structure-preserving signatures still suffer in terms of performance 
when signing multiple messages (cf. drawback (3)), which is a typical requirement 
in applications such as anonymous credentials. Indeed, as for CL-signatures, the 
size of proofs with structure-preserving signatures grows linearly with the num- 
ber of messages. As the constant factor for GS-proofs is larger than for gener- 
alized Schnorr proofs, structure-preserving signatures loose their attractiveness 
as a building block for such applications. 

Our Contribution. In this paper, our goal is to address the three drawbacks of 
CL-signatures discussed above. To this end, we propose a new type of signa- 
ture scheme, unlinkable redactable signatures (URS), in which one can redact 
message-signature pairs and reveal only their relevant parts each time they 
are used. Moreover, signatures in URS are unlinkable and the same message- 
signature pair can be redacted and revealed multiple times without being linked 
back to its origin. The real-world efficiency of URS is comparable to that of 
CL-signatures when a single message is signed and becomes superior when the 
number of messages signed grows. We view our contribution as threefold: First, in 
Sect. 2, we formally define URS. We present property-based security definitions 
for unlinkability and unforgeability and also a UC functionality for URS. Com- 
paring the two definitions we find the seemingly common phenomenon that the 
functionality-based definition requires a key-registration process (allowing for the 
extraction of keys in the proof) while the property based definition per se does 
not require that. We validate our definitions by showing that an URS scheme 
satisfying strengthened property-based security definitions with key extraction 
securely implements our UC functionality. 

Second, in Sect. 3, we construct an efficient URS scheme from vector commit- 
ments [37,51,56], structure-preserving signatures [2,3], and (a minimal dose of) 
non-interactive proofs of knowledge (NIPoK), which in practice can be instanti- 
ated by witness-indistinguishable Groth-Sahai proofs [45]. As we are interested 
in practical efficiency, we instantiate our scheme with concrete building block 
that deliberately rely on stronger assumptions (see Sect. 4.3). However, if one 
is willing to accept a less efficient scheme, a CDH-based vector commitment 
scheme [37] secure under less strong assumptions. We show how to make use of 
algebraic properties in our building blocks to minimize the witness size of the 
NIPoK. 

Third, in Sect. 4, to demonstrate the versatility of our URS scheme as a 
CL-signature scheme ‘replacement’, we employ it to design the first efficient uni- 
versally composable anonymous credential system that supports multiple issuers, 
pseudonyms, and selective disclosure of attributes. 

Anonymous credential systems usually need to support an ecosystem of dif- 
ferent features. Therefore, a single ideal functionality providing all the features 
such as pseudonyms, selective attribute disclosure, predicates over attributes, 
revocation, inspection, etc. would be very complex and hard to both create and 
use in a modular way — not to mention credible security proofs. 
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Nevertheless, ideal functionalities are very attractive for modeling the com- 
plexity of anonymous credential schemes. Indeed an early seminal paper [29] 
attempted exactly this, but was foiled by drawback (2) — as well as by the imma- 
turity of the UC framework at the time. To overcome this complexity, we present 
a flexible and modular approach for constructing UC-secure anonymous creden- 
tials. Namely, we design a core building block for a single issuer that supports 
credential issuance and presentation with respect to pseudonyms. We then show 
how to compose multiple such blocks to construct in a modular way a full-fledged 
credential system with multiple issuers. 

Besides being composable, our system is also arguably one of the first schemes 
to support efficient non-interactive attribute disclosure with cost independent of 
the number of attributes issued without having to rely on random oracles. Even 
in the random oracle model this has been an elusive goal. Therefore, because 
of the composability and efficient selective disclose, our scheme is very attrac- 
tive and quickly surpasses schemes based on blind signatures and CL-signatures 
[19,31] when the number of attributes grows. 

Related Work. We compare our signatures and credential schemes with other 
related work, a full comparison is deferred to the full paper [9]. As there are a 
multitude of papers on redactable, quotable, and sanitizable signatures [7,21,46, 
58], we focus on the most influential definitional work and the most promising 
approaches in terms of efficiency. 

A variety of signature schemes with flexible signing capabilities and strong 
privacy properties have been proposed [7,8,10,14,17,34,38]. While these works 
provide a fresh definitional approach, their schemes are very inefficient, espe- 
cially when redacting a message vector with a large number of attributes. Some 
schemes built on vector commitments [51,55] achieve better efficiency but only 
consider one-time-show credentials, and while the scheme in [51] is not defined 
formally, the scheme in [55] involves interaction. 

The first efficient multi-show anonymous credential scheme is [29]. It was 
extended with efficient attribute disclosure [24] and had real-world exposure 
[20,33]. It can, however, only be non-interactive in the random oracle model. 
Non-interactive credentials in the standard model have been built from 
P-signatures [12,13]. An instantiation of our URS scheme, however, is almost 
twice more efficient than [12] despite the fact that the latter does not support 
signing multiple messages. Belenkiy et al. [11] show how to use the randomiz- 
ability of P-signatures for delegation and Chow et al. [41] extend the random- 
izable group signatures scheme underlying [11] with a flexible attribute mecha- 
nism. Izabachene et al. [50] extends the work of [12] with vector commitments; 
their scheme is, however, not secure under our definitions. In independent work, 
Hanser and Slamanig [47] present a credential system with efficient (indepen- 
dent of the number of attributes) attribute disclosure. However, their system 
is only secure in the generic group model [43]. Furthermore, it uses hash func- 
tion to encode attributes and thus does not enable efficient protocol design. 
None of these schemes is (universally) composable. Camenisch et al. [27] have 
recently proposed property-based definitions of anonymous credentials and of 
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the necessary building blocks, given a construction and proved it secure. Their 
definitions turn out to be rather complex, indicating that for complex systems 
functionality-based definitions might be easier to handle. Indeed, for their def- 
inition of privacy, Camenisch et al. make use of what they call ‘filter’ which is 
very reminiscent of an ideal functionality. Finally, the construction they provide 
is based on CL-signatures and thus suffers from the drawbacks of that approach. 

An important factor that is often neglected is the compatibility of schemes 
with zero-knowledge proofs to enable efficient protocol design. Because of its 
compatibility with Groth-Sahai proofs, efficiency and composability, immediate 
further applications of our URS scheme include efficient e-cash, credential-based 
key exchange, e-voting, auditing, and others. 

2 Definitions of Unlinkable Redactable Signatures 

Redactable signatures are an instance of homomorphic [7] or controlled-malleable 
signatures [38] . For our credentials application the most useful redaction opera- 
tion is to selectively white-list or quote a subset of messages and their positions 
from a message vector of length n ([7] consider the quoting of sub-sentences). 
We denote the message space of all valid message vectors as M. We also refer 
to the redacted message as a quote of the original message. To distinguish the 
original vector from the quote of all messages we denote the original vector as 
m = (1, mi, ... , m n ) and a quote as mi = (2, mi, ... , m' n ). We represent each 
valid quoting transformation by a set I C [1 , n] of message positions and denote 
quoting either by /(m) or m/. We denote the i th message element either by m[i\ 
or mi. A quote m/ from m is of the form 



Note that the message itself already reveals whether it is a quote. Chase et al. 
[38] call such a scheme tiered and we refer to the vectors m and m/ as Tier 1 
and Tier 2 vectors respectively. The vector mi can be sparse and can have a 
much shorter encoding than m. Finally, we define Zero (ra, 7) = (1, mi, . . . , m n ), 
with fhj = mj for j E 7 and rhj = 0 for j £ 7. This should not be confused with 
the operator 7 that outputs a Tier 2 message. 

2.1 Property-Based Definitions for Unlinkable 
Redactable Signatures 

One can define the security of redactable signatures by instantiating controlled- 
malleable signature definitions for simulat ability, simulation unforgeability, and 
simulation context-hiding of Chase et al. [38] with the quoting transformation 
class T = {/(•)! 7 C [l..n]} above. We prefer, however, to give our own unforge- 
ability and unlinkability definitions that are more specific and do not rely on 
simulation and extraction. This makes them simpler and easier to prove, and 
thus more efficiently realizable. Together with key extract ability they are never- 
theless sufficient to realize the strong guarantees of our UC functionality. 
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Definition 1 (Unlinkable Redactable Signatures). An unlinkable redac- 

table signature scheme URS consists of the following algorithms: 

URS.SGen(U) —> SP. SGen takes the security parameter 1 K as input and outputs 
the system parameters SP. 

URS.Kg(SP) — ► ( pk,sk ). Kg takes the system parameters SP as an input and 
outputs public verification and private signing keys ( pk , sk ). The verification 
key pk defines the message space A4 . 

URS. Sign (sk, m) — > a. Sign takes the signing key sk and a message m G M. as 
input and produces a signature a. 

URS.Derive(pA;, /, ra, a) — > 07 . Derive takes the public key pk , a selection vector 
I, a message m and a signature a (both of Tier 1) as input. It produces a 
Tier 2 signature 07 for rrij. 

U RS. Verify (pk, a, m) — ► 0/1. Verify takes the verification key pk, a signature a, 
and a message m of Tier 1 or Tier 2 as input and checks the signature. 

We omit the URS qualifier when it is clear from the context. 

Correctness. Informally, correctness requires that for honestly generated keys, 

both honestly generated and honestly derived signatures must always verify. 


Unforgeability. Unforgeability captures the requirement that an attacker, who is 
given Tier 1 and Tier 2 signatures on messages of his choice, should not be able 
to produce a signature on a message that is not derivable from the set of signed 
messages in his possession. More formally: 


Definition 2 (Unforgeability). Let H output unique handles, for instance 
implemented using a counter. For a redactable signature scheme URS. {SGen, 
Kg, Sign, Derive, Verify}, tables Qi,Q 2 ?Q 3 , an d an adversary A, consider the 
following game: 


- Step 1. SP <- SGen(l fc ); (pk, sk) £ Kg(SP); Q u Q 2 , Q 3 <- 0. 

- Step 2. ( m*j,a *) ^ 4 e, sign(-),e>D.Hv.(v),c>R«e a i k)(pk), where Os\ gn , (^Derive, and 

Reveal behave as follows: 


G^Sign {tTI) 

h <— H; a Sign m) 
add (h, m, a) to Qi 
return h 


G^Derive -^) 
if (h, m,cr) G Qi 

cr' Derive(pA:, /, m, cr) 
add mi to Q 2 ; return a' 


C Reveal ifd) 

if(h,m,a) G Q% 
add m to Q 3 
return a 


A signature scheme URS satisfies unforgeability if for all such PPT algorithms A 
there exists a negligible function v(-) such that in the above game the probability 
(over the random choices of Kg, Sign, Derive and A) that Verify (pk,cr*, mj) = 1 
and Vm G Q 3, 7^ mi, and m } Q 2 is less than u(k). 


Note that we do not consider a Tier 1 signature itself as a forgery. However, 
if the adversary manages to produce a valid Tier 1 signature on a message 
m without calling Sign(m) and either Reveal(h) or Derive(h,J) on all subsets 
I C [1 ..n\ for the corresponding handle h , he can use this Tier 1 signature to 
break unforgeability by deriving a Tier 2 signature from it. 
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Unlinkability. Informally, unlinkability ensures that an adversarial signer cannot 
distinguish which of two Tier 1 signatures of his choosing was used to derive a 
Tier 2 signature. More formally: 

Definition 3 (Unlinkability). For the signature scheme URS.{SGen, Kg, Sign, 
Derive, Verify} and a stateful adversary A, consider the following game: 

- Step 1 . SP <- SGen(l /c ). 

- Step 2. (pk, I, m(°\ m^, crt 1 )) A(SP), where = m>j\ 

Verify (pk, m^) = 1, and Verify (pk, crt 1 ), m^) = 1. 

- Step 3. Pick b {0, 1} and form a[ b ^ Derive(pk, I, m^ b \a^). 

- Step 4- b' A(crj b>} ). 

The signature scheme URS is unlinkable if for any polynomial time adversary A 
there exists a negligible function v(-) such that Pr [b = b'] < . 

Note that this definition is very strong, as the adversary can even pick pk. 


2.2 Ideal Functionality for Unlinkable Redactable Signatures 

We now give an alternative characterization of unlinkable redactable signatures 
using an ideal functionality Purs defined as follows: 


Functionality Purs 

The functionality maintains tables 1C and Q initialized to 0 and flags kg and keyleak 

which are initially unset. 

- On input (keygen, sid) from S, verify that sid — (S', sid') for some sid' and that 

flag kg is unset. If not, then return _L. Else, send (initF, sid) to STM and wait 
for a message (initF, sid, SP, Kg, Sign, Derive, Verify) from STM , where Kg, 
Sign, and Derive are PPT algorithms and Verify is a deterministic algorithm. 
Then, store SP, Kg, Sign, Derive, and Verify, run ( pk,sk ) Kg(SP), set flag 

kg , store (pk, sk) in 1C, and return (verif icationKey, sid,pk ) to S. 

- On input (checkPK, sid, pk') from some party P, verify that the flag kg is 
set. Check whether pk' = pk or whether ( pk' , sk ') for some sk' was stored 
in 1C. In this case, return (checkedPK, sid, true). Else, if (pk' , - L) was stored in 
1C return (checkedPK, sid, false). Else, send (checkPK, sid, pk') to STM, wait 
for (checkedPK, sid, sk') from STM, add (pk' , sk') to 1C. If sk' ^ _L, return 
(checkedPK, sid, true) to P. Otherwise, return (checkedPK, sid, false) to P. 

- On input (leakSK, sid) from S verify that sid = (S,sid') for some sid'. If not, 
return _L. Else, if flag kg is set, set flag keyleak and return (leakSK, sid, sk), 
otherwise - abort. 

- On input (sign, sid, m) from S, verify that sid = (S,sid') for some sid' 

and that the flag kg is set. If not, return _L. Else, run a <— Sign (sk, m) and 
Verify (pk,a,m). If Verify is successful, return (signature, sid, m, a) to S and 
add m to Q , otherwise return _L. (Continue on the next page.) 
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- On input (derive, sid , pk' , /, rn, a) from some party P, run Deri ve(pU, /, ra, a) 

and if it fails, return A. Otherwise, if the flag kg is set and pk = pk' 
then set sk t m P = sk. If there is an entry (pk ' , sk') G 1C recorded, 
set sktmp — sk' . If sktmp was set run a' Sign(s/ct mp , Zero(m, /)) 

and return Derive(pU, /, Zero(m, /), cr'). Otherwise, return the output of 
Derive(pU, /, ra, cr). 

- On input (verify, szd, pU, cr, ra/) from some party P, compute result <— 
Verify(pU, cr, raj). If the flag kg is set, pk' = p/q flag keyleak is not set, and 
JmG Q such that ra/ — /(ra), then output (verified, szd, m/,0). Otherwise, 
output (verified, sid , mi , result). 


We point out some aspects of the ideal functionality. The functionality needs 
to output concrete values as signatures of messages and redacted signatures, as 
well as key material. To generate and verify these values, Purs requires the adver- 
sary/simulator STM to provide it with a number of polynomial-time algorithms. 
This is in line with how ideal functionalities for signatures, and in particular blind 
signatures, have been defined before [6,35,42,49,53]. We consider static corrup- 
tions of protocol machines, but allow the simulator to request the signing key 
at any time by sending the leakSK message. This allows us to ensure that the 
privacy properties for users are still enforced even if the signer leaks its secret 
key. The functional and security properties are enforced by the functionality no 
matter how these (adversarial) algorithms compute the values. Unforgeability is 
enforced by the fact that Purs will output false (0) for verification queries for 
which the message (or a corresponding original message) has not been signed, 
provided that the signer is not corrupted and the signing key not leaked. (If the 
signer is corrupted statically, (keygen, sid) will not be sent and hence kg not set 
and unforgeability not enforced.) Unlinkability of redacted signature is enforced 
by Purs as follows. It generates a fresh redacted signature only from those parts 
of the original message that are quoted, i.e., the hidden message parts are set 
to zero, and thus redacted signatures from Purs do not contain any information 
about the hidden parts of messages. More precisely, this is enforced for the keys 
generated by Purs and for any keys that an honest party successfully checked 
before generating a redacted signature. Unless mentioned otherwise, the reply 
of the functionality upon a failed check or verification is A. 


2.3 Key Registration and UC Realizability 

We now want to construct a protocol Purs that realizes Purs using a URS 
scheme in the PcRs-hybrid model where SP is the reference string and each call 
to Purs is essentially replaced by running one of the algorithms of URS. While 
this can be done (the detailed description of Purs is given in the full version [9]), 
there are a number of hurdles that need to be overcome. These hurdles are quite 
typical and include, e.g., that we need to be able to extract the secret keys from 
the adversary to be able to simulate properly. They are, however, often treated 
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only informally in security proofs. Here we want to make them explicit and treat 
them formally correct. So our goal is to prove a statement (Theorem) of the 
form: 

If URS is correct , unforgeable, unlinkable, and X then Purs securely 
realizes JF URS the Xcrs - hybrid model 

What do we have to require from X to make this theorem true? To prove the 
theorem we have to show indistinguishability between the ideal world and the 
real world. In the ideal world, an environment Z interacts with the simulator 
STM and functionality Purs- In the real world, the environment Z interacts 
with the real adversary A and the protocol Purs- 

We provide a tentative description of STM in the ideal world: when receiv- 
ing the (initF, sid) message from Purs, it generates a trapdoor td (in addi- 
tion to SP) and returns (initF, sid, SP, Kg, Sign, Derive, Verify). On receiving 
the (checkPK, sid,pk ) message, is uses the trapdoor to extract the secret key sk 
and returns sk to Purs- 

To make this work, we extend URS with several algorithms: CheckPK is run 
by Purs on receiving a message (checkPK, sid, p k). SGenT and ExtractKey are 
the trapdoored parameter generation and key extraction algorithm for STM. 
Check Keys is used to define what it means to extract a valid key. 

URS.CheckPK(p£;) — ► 0/1. CheckPK is a deterministic algorithm that takes a 
public key pk as an input and checks that it is correctly formed. It outputs 1 
if pk is correct , and 0 otherwise. 

URS.SGenT(l^) — > (SP,td). SGenT is a system parameters generation algorithm 
that takes the security parameter 1 ^ as input and outputs the system para- 
meters SP and a trapdoor td for the key extraction algorithm. 

U RS. Extract Key (p k, td) — > sk. ExtractKey is an algorithm that takes a public key 
pk and a trapdoor td as input. It extracts the corresponding secret key sk. 
URS.CheckKeys(p£;, sk) — ► 0/1. CheckKeys is an algorithm that takes a public pk 
and a private sk keys and checks if they constitute a valid signing key pair. 
It outputs 1 if they do, and 0 otherwise. 

Strengthened Correctness requires that honestly generated keys, but also keys 
for which predicate CheckKeys(p£;, sk) holds can be used to create signatures 
that will verify. It moreover guarantees that CheckPK(pA;) holds for honestly 
generated public keys. 

Parameter Indistinguishability. Informally, parameter indistinguishability 
ensures that the SP produced by SGenT and SGen are computationally indistin- 
guishable. It is formally defined as follows: 

Definition 4 (Parameter Indistinguishability). A redactable signature 
scheme URS. {SGen, Kg, Sign, Derive, Verify} with alternative parameter genera- 
tion SGenT is parameter indistinguishable if for any polynomial time adversary 
A there exists a negligible function v{f) such that Pr [(SPo,td) <— SGenT(l /c ); 
SPi <- SGen(l fe ); b <- {0, 1}; V <- A(SP b ) : b = b'] < 
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Key Extractability. Informally, the key extractability ensures that if SGenT was 
run and if CheckPK outputs 1, then the extraction algorithm Extract Key (p&, td) 
will output a valid secret key sk, i.e. CheckKeys(p£;, sk ) = 1. 

Definition 5 (Key Extractability). A redactable signature scheme URS. 
{SGen, Kg, Sign, Derive, Verify} with additional algorithms (CheckPK, SGenT, 
CheckKeys, ExtractKey) is key extractable if CheckKeys is correct and for any 
polynomial time adversary A there exists a negligible function v(-) such that 
Pi[(SP,td) <- SGenT(l k ); pk <- A(SP,td); sk <- ExtractKey(pA;, td) : (CheckPK 
(pk) = 1 A CheckKeys(pA;, sk) = 0))] < u(k). 

Composable Unlinkability holds even when parameters in the unlinkability game 
are generated using (SP,td) SGenT(l /c ) and A is handed td. This allows for 
the use of the game in a hybrid argument when proving the security of the simu- 
lator. We note that in such an adapted unlinkability game the trapdoor td must 
only enable key-extraction, and crucially does not allow the adversary to extract 
a Tier 1 signature from a Tier 2 signature (this would break unlinkability). In 
our instantiation this is achieved by splitting SP into several parts. The trapdoor 
is only generated for the part used for key extraction. 

UC Realization. We prove that if an unlinkable redactable signature URS is cor- 
rect, parameter indistinguishable, key extractable, unforgeable, and unlinkable, 
then Purs securely realizes Purs- More formally, we have the following theorem 
(which is proven in the full version of this paper [9]). 

Theorem 1. Let URS be an unlinkable redactable signature scheme. If URS is 
correct, parameter indistinguishable, key- extractable, unforgeable, and compos- 
able unlinkable then Purs securely realizes Purs in the Pcrs -hybrid model. 

3 The Construction of Our Redactable Signature Scheme 

As a first step toward our full solution, we will construct an unforgeable and 
unlinkable URS scheme without key extraction. The scheme should be of inde- 
pendent interest, in case universal composability is not a design requirement. 
This isolation of key extraction, seemingly only needed for universal composi- 
tion, is a nice feature of our definitions. 

Let Q be a bilinear group generator that takes as an input security parameter 
\ K and outputs the descriptions of multiplicative groups grp = (p, G,G, Gt,e, 
G, G) where G, G, and G t are groups of prime order p, e is an efficient, non- 
degenerating bilinear map e : G x G — > G$, and G and G are generators of the 
groups G and G, respectively. 

Our construction makes use of a structure preserving signature (SPS) scheme 
SPS. {Kg, Sign, Verify} and a vector commitment scheme VC. {Setup, Commit, 
Open, Check}. We recall that the structure-preserving property of the signature 
scheme requires that verification keys, messages, and signatures are group ele- 
ments and the verification predicate is a conjunction of pairing product equa- 
tions. The intuition behind our construction is susceptibly simple: Use SPS. Kg 


272 


J. Camenisch et al. 


to generate a signing key pair and VC. Setup to add commitment parameters to 
the public key. To sign a vector ra, first, commit to m and then sign the resulting 
commitment C. To derive a quote for a subset I of the messages, simply open 
the commitment to the messages in m/. We verify a signature on a quote by 
verifying both the structure-preserving signature (S PS. Verify) and checking the 
opening of the commitment (VC. Check). 

Such a scheme has, however, several shortcomings. First, it is linkable, as 
the same commitment is reused across multiple quotes of the same message. 
Even if both the underlying SPS scheme and the commitment scheme are indi- 
vidually re-randomizable, this seems hard to avoid as the unforgeability of the 
SPS scheme prevents randomization of the message. Second, such a construction 
is only heuristically secure. Existing vector commitments do not guarantee that 
multiple openings cannot be combined and mauled into an opening for a different 
sub- vector. We call vector commitment schemes that prevent this opening non- 
malleable. (Recently, [47] constructed an SPS scheme allowing for randomization 
within an equivalence class. However, their commitments cannot be opened to 
arbitrary vectors of 7L V and are not provably opening non-malleable.) 

Our main design goal is to address both of these weaknesses while avoiding a 
large performance overhead. Our main tool for this is an efficient non-interactive 
proof-of-knowledge. Intuitively, we hide the commitments and their openings, as 
well as a small part of the signature to achieve unlinkability. Hiding the com- 
mitment opening also helps solve the malleability problems for commitments. To 
achieve real-world efficiency we show how to exploit the re-randomizability of the 
SPS (and optionally the commitment scheme as described in the full version [9]). 

Before describing our redactable signature scheme in more detail, we present 
a vector commitment scheme that uses a variant of polynomial commitments 
from [51]. While our changes are partly cosmetic, they simplify the assumption 
needed for opening non-malleability. 


3.1 Vector Commitments Simplified 

A vector of messages m G Z™ is committed using a polynomial f(x) that 
has a value f{i) = mi at the position i. In Lagrange form such a polynomial 
is a linear combination f{pc) = °f Lagrange basis polynomials 

h( x ) = n”=o , j7 H jzj- To batch-open a vector commitment for a position set 
I C {1, . . . , n}, one uses a polynomial fi(x) = ^Z ieI mifi(x). For such a poly- 
nomial, it holds that fi(i) = mi for i G /; and //( 0) = 0. (The additional root 
at 0 is necessary to achieve opening non- malleability) . The reuse of the same 
Lagrange basis polynomials, which yields polynomials of not the lowest possible 
degree, reduces the number of variable bases in the equation of Check below and 
increases efficiency when used for the construction of bigger protocols such as 
anonymous credential. Also, note that f(x) — fi(x ) is divisible by the polynomial 
Pl{x) = X ■ IW( x — i). We use the polynomial p(x) = x • Yi^ii 00 ~ i) which 
is divisible by pi(x) for any I C {1, . . . , n} to randomize commitments to make 
them perfectly hiding. 
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Construction. We reuse the notation of Sect. 2 and use Tier 1 vectors m for the 
vectors being committed and Tier 2 vectors m/ for batch openings at positions 
I. We also let grp = (p, G, G, G$, e, G, G) be bilinear map parameters generated 
by a bilinear group generator (5 (IT). 

VC.Setup(grp). Pick a <— Z p and compute (Gi, Gi, . . . , G n +i, G n +i), where 
Gi = and Gi = G^\ Output parameters pp = (grp, Gi, Gi, . . . , 

G n + 1, G n +i). Values Gi, . . . , G n +i suffice to compute for any polyno- 

mial <j)(x) of maximum degree n + 1 (and similarly for G^°^). 

Furthermore, for the above defined fi(x),p(x ), and pi(x), we implicitly define 
Fi = G^^, P = G p ( a \ Pi = G Pl ( a \ and Pi = G Pl ^ . These group elements 
can be computed from the parameters pp. 

VC.Commit(pp, m,r). Output C = Yli=i F™ i P r . 

VC. Open (pp,/, m,r). Let w(x) = an ^ com P u t e the witness 

VF = G w ^ using parameters pp. 

VC.Check (pp, C, mi, W). Accept if e(G, G) = e(W, Pi)e(JJ ie i G). 

Note that pi(x) always has the factor x. This is essential for achieving opening 
non-malleability. If pi(x) would be 1 for I = 0, as in the original polynomial 
commitment scheme of [51], then C would be a valid batch opening witness for 
the empty set of messages. 


Security Analysis. We require the commitment scheme to be complete , batch 
binding , and opening non-malleable. Completeness is standard for a commitment 
scheme follows easily from the following equation: e(G, G) = e(G, G)A a )+ r - p ( a ) = 


/(oQ-/j(oQ + r 

e(G,Pi) 


p(qQ 


e(G, G)*(“> = e(^A)e(n ie/ r- G). 


Next, we define the batch binding and opening non-malleability properties: 


Definition 6 (Batch Binding). For a vector commitment scheme VC. {Setup, 
Commit, Open, Check} and an adversary A consider the following game: 

- Step 1. grp (7(1*) and pp VC.Setup(grp) 

- Step 2. G, m/, W, m ' r , W 7 A(pp) 


Then , the commitment scheme satisfies batch binding if for all such PPT algo- 
rithms A there exists a negligible function i/(-) such that the probability (over 
the choices of Q , Setup, and A) that 1 = VC.Check (pp,C,mi,W) = VC.Check 
(pp, G, m},, W 7 ) and that there exist i G / D I' such that m i ^ m! i is at most 
v(k). (Note that mi and mfi are Tier 2 vectors, and thus encode the sets I and 
V respectively.) 

Definition 7 (Opening Non- malleability). For a vector commitment scheme 
VC. {Setup, Commit, Open, Check} and an adversary A consider the following 
game : 

- Step 1. grp <— Q( 1*) and pp <— VC. Setup (grp) 

- Step 2. m, I A(pp) 
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- Step 3. Pick random r, compute VC.Commit(pp, ra, r), 
and W ^ VC. Open (pp, /, m, r). 

- Step 4 . W',V <- A(C,W) 

Then the commitment scheme satisfies opening non- malleability if for all such 
PPT algorithms A there exists a negligible function z/(-) such that the probability 
(over the choices ofQ , Setup, Commit, and A that 1 = VC.Check(pp, C, ra//, W f ), 
and I V is at most 

In the following theorems we make use of the n-BSDH assumption [44] and 
the n-RootDH assumption that are defined next. See the full version of this 
paper [9] for its generic group model proof. (We note that this assumption is 
only required for opening non-malleability, which is ignored by most existing 
constructions of anonymous credentials from vector commitments.) 

Definition 8 (n-SDH Assumption). The n-strong Diffie- Heilman (n-SDH) 
assumption [16] states that there exists a Q that for all algorithms A , the follow- 
ing advantage 

Advg SDH (A) = Pr [(p, e, G, G) 4 - g ; x, c A ; 

A(l\p,G, G, G x , . . . , G xn ) = (c, GV^+c))] < neg |(A). 

The n-BSDH assumption is defined identically to n-SDH except that now A 
is challenged to compute (c, e(G, G 1 /^ +c ^ ) ). Note that the n-BSDH assumption 
is already implied by the n-SDH assumption. 

Definition 9 (n-RootDH Assumption). 

For an adversary A consider the following game: 

- Step 1. grp <— Q(1 K ) 

- Step 2. Pick random a,r «— Z* ; compute X = (G a '^ i=A a ~^y . 

- Step 3. (J, state ) <- A(G, G, {G a \ G ai }?+•£, X) 

- Step 4 • Compute Y = . 

- Step 5. J' ,Z A(state,Y) 

Then the group generator Q satisfies the n-RootDH assumption if for all such 
PPT algorithms A there exists a negligible function z/(-) such that the probability 
(over the choices of Q, a, r, and A that J and J f are subsets of [1 ..n] ; J r J , 
and Z = (Gn<ej' (<*-*))*■ is at most a negligible function n(n). 

Theorem 2. The commitment scheme VC defined above is batch binding under 
the (n + 1)-BSDH assumption. The proof is similar to that of [51] and can be 
found in the full version [9]. 

Theorem 3. The commitment scheme VC defined above is opening non- 
malleable under the n-RootDH assumption. The proofs can be found in the full 
version [9]. 


Composable and Modular Anonymous Credentials 275 


3.2 Non-interactive Zero-Knowledge and Witness Indistinguishable 
Proof Systems 

Let R be an efficiently computable binary relation. For pairs (W, Stmt ) E R we 
call Stmt the statement and W the witness. Let C be the language consisting of 
statements in R. A non-interactive zero-knowledge (NIZK) proof-of-knowledge 
system for a language C consists of the following algorithms and protocols: 

II. Set up (grp) — > CRS. On input grp <— (1A), it outputs common parameters 
(a common reference string) CRS for the proof system. 

II.Pro\/e(CRS , W, Stmt) — > 7 r. On input a statement Stmt and a witness W, it 
generates a zero-knowledge proof 7 r that the witness satisfies the statement. 
II. Ver \fy( CRS, 7T, Stmt ) — > 0/1. On input Stmt and tt, it outputs 1 if tt is valid, 
and 0 otherwise. 

We explain the notation for the statements Stmt. We call extractable 
(/-extractable [12]) witnesses that can be (only partially) extracted from the 
corresponding proof, respectively. To express the “extractability” property of 
the witnesses we use notation introduced by Camenisch et al. [28]. For the 
extractable witnesses we use the “knowledge” notation (>l), and for the /- 
extractable witnesses we use “existence” (3 ) notation. (If function / is constant, 
nothing can be extracted.) We define kas a set of extractable witnesses and £ 
as a set of the witnesses that we can only prove existence about. We only con- 
sider proofs for multi-exponentiations (for existence) and pairing products (for 
existence and knowledge) equations. The following is an examplary statement: 

rri 

Stmt = X {y i5 Yi € &y ; 3 {x 3 e <£}™ =1 : z = G x > 

3 = 1 
n 

A e(G, G ) = J[ (e(Yi, B { ) • e(A i; Yi)). 


For simplicity of presentation, we do not explicitly specify public values of a 
statement as additional input to the algorithms, since they are clear from the 
description of the statement and the list of witnesses. 

We employ different proof systems that are either witness indistinguishable 
or zero-knowledge in terms of privacy, and either extractable or simulation- 
extractable in term of soundness. For the security proofs we introduce the 
following algorithms: 

77. Ext Set up (grp) — > (CRS ,td ext ). On input grp , it outputs a common reference 
string CRS and a trapdoor td ex t for extraction of valid witnesses from valid 
proofs. This is for witness-indistinguishable extractable proofs. 
T7.SimSetup(grp) — > (CRS ,td ex t,td S i m ) It outputs a CRS and the extraction 
and simulation trapdoors. This is for proofs that are also zero-knowledge. 
7I.SimProve(CRS', td s \ m , Stmt) — ► 7 r. On input CRS and a trapdoor td s i m , it 
outputs a simulated proof 7 r such that IJ.\/enfy(CRS , tt, Stmt ) = 1. 
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77. Extract (CPS', td ext , tt, Stmt) — > W. On input a proof tt and a trapdoor td ex t , 
it extracts a witness W that satisfies the statement Stmt of the proof n. 

For simulation-extractable NIZK proofs (that are non-malleable) we also 
allow an additional public input to the Prove, Extract, Sim Prove, and Verify algo- 
rithms - a message (label) 7, which is non-malleably attached to the proof (i.e. 
the signature of knowledge is computed on this message) . We provide a formal 
definition below. 

Definition 10 (Simulation Extractability). A proof system 77 is called sim- 
ulation extractable with labels if for any PPT adversary A and security parameter 
X there exists a negligible function negl(-) such that: 

Pr [ ( CRS , td s \ m , td ext ) 4- SimSetup(l A ); (Stmt* , L* , tt) <- A° Sim(tdsim ’'’' ) ( C RS ) ; 

W <- Extract(CRS ,td ext ,TT, Stmt* ,L*) : Verify (CSS, n, Stmt*,L*) = 1A 
(W, Stmt*) £ R A Osim was never queried with (. Stmt* ,L*)\ < negl(A). 

3.3 Our Redactable Signature Scheme 

We construct our redactable signature scheme URS from a structure-preserving 
signature scheme SPS, a vector commitment scheme VC, and an extractable and 
witness-indistinguishable non-interactive proof-of-knowledge system 77 described 
in the previous section. Some SPS and vector commitment schemes might also 
support randomization; we already discussed such a property for vector com- 
mitments in the last sub-section; for signatures we refer the reader to [2,3]. We 
denote the randomization algorithm of signatures by SPS. Rand. We denote the 
randomizable elements of a SPS signature 77 by ipmd(^) and the other elements 
by Vw(77). (For a non-randomizable SPS signature ^± t (XJ) =77) 

Our construction does not utilize any randomizability in the vector commit- 
ment scheme itself. In the full version [9] we analyze batch-binding and opening 
non-malleability in presence of such a randomization algorithm. 

Construction. 

URS.SGen(l^). Compute grp <— (?(1^), pp <— VC. Setup (grp), CRS T7.Setup 
(grp), output SP = ( grp,pp , CRS). 

URS.Kg(SP). Obtain grp from SP, generate (pk sps , sk sps ) <— SPS.Kg(prp), 
output pk = ( pk sps , SP) and sk = ( sk sps ,pk ). 

URS.Sign(s£;, m). Pick r Z p , compute C = VC. Commit (pp,m,r) and 77 <— 
SPS.Sign(s£; S p S , (7), and return a = (77,(7, r). 

URS.Derive(p£;, 7, m, a). First, compute W = VC.0pen(pp, 7, m, r). Then, if a 
SPS. Rand algorithm is present, randomize the signature as 77' SPS. Rand 
(pk sp si 77); otherwise, set 77' 77. And compute the proof i r T7.Prove 

(CRS;C, VF,Vw (77'); MC, W,Vw (77') : SPS. Verify (p/c sps , 77', C) AVC.Check 
(pp, (7, m/, W) ). Return a = (V w(77'), 7r) as the signature on m/. 
URS.Verify(pA:, cr, mj). Check that 77.Verify( CPS'; tt; >1 C, W, Vw(77') : SPS. 
Verify(p/c sps ,77',C)) = VC.Check(pp, C, m u W) = 1. 
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Theorem 4. URS is an unforgeable redactable signature scheme, if the SPS 
scheme is unforgeable, the vector commitment scheme satisfies the batch bind- 
ing and opening non-malleability property, and the proof- of -knowledge system is 
extractable and witness indistinguishable. The proofs of Theorems 4 is provided 
in the full version [9]. 

Theorem 5. URS is an unlinkable redactable signature scheme if the proof- of- 
knowledge system is witness indistinguishable. The proofs are given in the full 
version of this paper [9]. 

Strengthened Scheme for an Universally Composable Construction. To be able 
to satisfy the UC functionality, we require an additional key-extraction property. 
We thus build an augmented redactable signature scheme U RS from a redactable 
signature scheme URS* (without key extraction) and a zero-knowledge non-inter- 
active proof-of-knowledge system 77*. 

URS.SGen(lT). Run SP* <- URS*.SGen(l"), get grp from SP*, run CRS sk <- 
II* .Setup (grp), and output SP = (SP* , CRS sk ). 

U RS. Kg (SP). Obtain SP * and CRS sk from SP , sample randomness r, and run 
(pk* , sk*) <— URS*.Kg(5P*; r). Compute the proof 

TTs/c * — 77* .Prove ( CRS sk ; (sk* , r) ; >1 sk* 3 r : (pk* , sk*) = URS*.Kg(SP*; r)) . 
Output pk = (SP, pk* ,7r sk ) and sk = ( sk*,pk ). We note that URS*. Kg 
(SP*;r) fixes the randomness of the a key generation algorithm. 
URS.CheckPK(SP, pk). Check n* Verify (CRS sk ]n sk i Xsk 3 r : (pk,sk) = 
URS*. Kg (SP*;r)) = 1. 

Sign, Derive, Verify are almost unchanged and use pk * internally. SGenT and 
Extract Key use the extraction setup and extractor of the proof system respec- 
tively, while CheckKeys checks that the relation R holds for pk and sk. 

Note that Groth-Sahai proofs can be used to implement key- extract ion by 
proving a binary, or n-ary decomposition of the secret key [57]. But this comes at 
a huge cost of more than 61,000 group elements at 128-bit security, even if this 
cost is only incurred once by every user per public key. We propose instead to use 
fully structure-preserving signatures (FSPS) [5] such that sk consists of group 
elements and can be easily extracted. FSPS for signing single group elements can 
be as cheap as 15 elements per signature and proofs of key possession consist of 
just 18 elements. 

Theorem 6. The strengthened scheme URS is an unforgeable, unlinkable, and 
key extractable redactable signature scheme, if the underlying redactable signature 
scheme URS* is unforgeable and unlinkable, and the proof-of-knowledge system 
77* is zero-knowledge and extractable. 

Unforgeability and unlinkability are corollaries of Theorem 4 and Theorem 5. 
Key-extractability follows directly from the extract ability of the proof system. 
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Signing Group Elements as Additional Parts of the Message. While the presented 
redactable signature scheme can sign and quote a large number of values in 
very efficiently, in certain applications, like the one presented in the next section, 
one might also need to sign a small number of additional group elements. In the 
Derive algorithm these elements will either be part of the derived message, and 
given in the clear after derivation, or be treated as part of the witness, i.e., 
hidden from the verifier. The detailed construction and the security proofs are 
given in the full version [9]. 

4 From Unlinkable Redactable Signatures to Anonymous 
Credentials 

As we designed our UC-secure URS scheme as a building block for privacy- 
preserving protocols, anonymous credentials are a natural application. Indeed, an 
(unlinkable) redactable signature scheme is already a simple selective-disclosure 
credential system where the attributes issued to users are the messages signed 
in Tier 1 signatures and a user can later reveal a subset of her attributes by 
deriving a Tier 2 signature. However, in an anonymous credential system, users 
also require secret keys and pseudonyms (pseudonymous public keys) , on which 
credentials can be issued and with respect to which credentials can be presented. 
This allows users to prove that they possess several credentials issued from dif- 
ferent parties on the same secret key [19,31]. 

In this section, we extend the functionality of URS in two ways: (1) we bind 
Tier 1 signatures to user secret keys in a way that prevents the derivation of 
signatures without knowledge of the secret and (2) we bind Tier 2 derived sig- 
natures to the unique context, ext (nonce), to prevent replay attacks in which 
an attacker shows the same signature derived twice. 

We first recall the algorithms of a multi-issuer anonymous credential system 
and then provide an instantiation using URS. To be modular and to simplify the 
analysis, we then provide an ideal functionality for a single issuer. The function- 
ality is carefully designed to self-compose naturally into a full-fledged credential 
system with multiple issuers. Finally, we provide a concrete instantiation of our 
generic construction and analyze its efficiency. 


4.1 Algorithms of Our Anonymous Credential System 

Let us first introduce the parties and the algorithms of a multi-issuer anonymous 
credential system supporting user attributes (cf. [19,31]). Its protagonists are 
users (U), issuers (X), and verifiers (V). Each user has a secret key A, from 
which she can derive (cryptographic) pseudonyms P. To get a credential issued, 
a user sends to the issuer a pseudonym P together with a (non-interactive) proof 
ttx,p that she is privy to the underlying secret key. The issuer will then issue her 
a credential Cred on P containing the attributes a the issuer vouches for. The 
user can then present the credential to a verifier under a potentially different 
pseudonym P' by sending, together with P', a (non-interactive) proof i Tx,Cred 
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that she possesses a credential on the attributes a/. Recall that / defines which 
attributes shall be revealed. 

A credential system Cred defines a set of algorithms: a system parameters 
generation algorithm SGen; an issuer setup algorithm Kg; a user secret generation 
algorithm SecGen; algorithms for pseudonym generation and verification NymGen 
and Nym Verify, respectively; an algorithm to request a credential RequestCred; an 
algorithm for issuing a credential IssueCred; an algorithm to check a newly issued 
credential for correctness CheckCred; an algorithm to show a credential with 
respect to a pseudonym (to create a credential proof) Prove; and an algorithm 
to verify a credential proof Verify. 

A more detailed discussion of these algorithms is given in the full version [9] . 
We instantiate these algorithms by adding support for user secrets, pseudonyms 
and contexts to our redactable signature scheme. Besides the URS algorithms, 
we use pseudonym generation and verification algorithms based on a structure 
preserving commitment scheme SPC and a hard relation to generate credential 
specific secrets. A hard relation has a generator KR gap that generates a wit- 
ness (X Cred and a public value Yc re d ), and a verification algorithm VR gap , such 
that it is easy to verify (Xcred, Ycred) but hard to compute Xcred from Yc re d • 
This hardness is used to prevent a network adversary that observes the issuing 
protocol from impersonating the user. 

Table 1 gives the construction of our credential scheme. We group the core 
credential algorithms into those used for setup, issuing and presentation. In our 
security definition and the proof we will make use of additional algorithms for 
simulation and extraction. 


4.2 Ideal Functionality for Credentials 

To tame the complexity of definitions for credential systems with many different 
issuers, we chose to give a definition Jeered of a scheme for a single issuer only, 
that then can be used as building block to a a full-fledged credential system 
with multiple issuers. The single issuer functionally X c re d will just allow users 
to get a credential on a pseudonym from the issuer and to prove ownership of a 
credential by the issuer w.r.t. a given pseudonym. 

To serve as a secure building block, Pcred must be carefully designed. On the 
one hand it must deal with the unforgeability of credentials and on the other 
hand it must provide the hooks such that colluding users cannot mix and match 
credentials from different issuers. To address the former J~cred binds issued cre- 
dentials to the respective users’ secret key X and for the latter Tq red will enforce 
that credential proofs will not verify w.r.t. a pseudonym P unless a correspond- 
ing credential got issued to the X underlying that pseudonym. Then, provided 
adversarial users are unable to provide different X’s for the same pseudonym, 
credentials from different issuers issued to different users (i.e., different X’s) can- 
not be matched. As a consequence of this, the generation of user secret keys and 
pseudonyms is not done inside Jeered but users are require to input their secret 
key X the pseudonym P (as cryptographic values) to Jeered on the calls to get 
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Table 1. Algorithms of our credential system 

Cred.SGen(lA): Compute SP\jrs a- URS.SGen(lA); CRSx A- 17. Setup(l K ); pp s PC 
A- SPC.Setup(PPuRs); and output SP = (PPurs, CRS x , PPspc)- 
Cred.K g(PP): Compute (pk URS , sk urs) a- URS.KeyGen(PPuRs), and output 
(■ sk,pk ) = (sk\j RS ,pk VRS ). 

Cred.SecGen(PP) : Take C from SP, pick random x A- Z p , A = G x . Output X. 
Cred.NymGen(PP, X) : (P, O) A- SPC.Commit(pp SPC , X). Output (P, aux(P) = O ). 
Cred.NymVerify(PP, A, P, aux(P)) : Parse aux(P) as O. Output the result of 
SPC.Check(pp SPC , P, O). 

Cred.RequestCred(PP, pk, A, P, aux(P)) : 

(Acred, Ycred) A- KRgapJ ^X,P A- 77. PrOVe( CRS X ; ( A , Acred , CLUX (P)) ; Stmtp ) , 
where Stmtp = (M A, aux(P) : NymVerify(PP, A, P, aux(P)) = 1 ). 

Add A e-red, Ycred, P , aux(P) to aux(Cred) and Ycred to 7 rx,p- 
Cred.lssueCred(PP, s/c, P, a, ttx,p)‘ 

1. Verify the request for issuance ttx,p'- 

If 77.Verify(CPPx; 7rx,p; Stmtp) = 0, return _L. 

2. Else, generate a credential by creating a Tier 1 signature on the vector of 
messages, providing the pseudonym and a gap problem challenge, and 
calling cr A- URS.Sign(sA;, (P, Ycred, cl)) and output Cred = a. 

Cred. Checkered (SP,pk, A, P, aux(P), Cred , aux(Cred) , a) : Output the result of 
URS.Verify(p/c, Cred , (P, Yc red , a)). 

Cred. P rove (PP, pk, A, P', aux(P )' , Cred, aux(Cred), a, 7 , ext) -A 7 Tx,Cred- 

1. Obtain A cred, Ycred, P , aux(P) from aux(Cred) and cr from Cred. 

2. Run crj A- URS.Derive(pA:, 7, (P, Ycred, a), cr)). 

3. Compute a proof of knowledge of the secret, pseudonym, where the context 
is non-malleably attached as a label to the proof: 

7 Tx,Cred = 77. Prove ( CRSx ; (A, P, 

aux(P), aux(P) f , Ycred, X cred)] Stmt , ext); Stmt = 

(y\ A, P, aux(P), aux(P )' , Ycred, A C red : NymVerify(PP, A, P', au&(P)') = 

1 A NymVerify(PP, A, P, aux(P)) = 1 A URS. Verify (pA:, cr/, (P, Ycred, a)/)) = 
1 A VRg a p i^Xcred , Ycred) = 1 ) • 

Add cr i to 7 Tx,Cred as a part of the public input. 

Cred.Verify(PP, pk, P f , rrx,Cred, a/, ext) : Output the result of 77. Verify (CRSx, 
7Tx,Cred j Stmt{SP , P', <7/, a/), CXt). 


Cred.SGenT(r) : (PP UR s,td) A- URS.SGenT(l K ); 

( CRS x , td ex t , td s im) A- 77.SimSetup(l K ); pp SPC A- SPC.Setup(PPuRs). 

Output (PP = (PPuRS, CRSx, PPspc)’ tdext = (td, tdext), tdsim) • 

Cred.Extract(PP, td ex t,pk , P', 7 v x ,Cred, ai, ext): 

Take (A ,aux(P)) from 77.Extract(PP; td ex t’, nx,Cred', Stmt)). 

Cred.SimPro ve(PP, s/c, td S im, P' , aj, ext) — > irx,Cred'- 

1. A A- SecGen(PP); (P, aux(P)) a- NymGen(PP, A). 

2. Let ao be a Tier 1 message restored from a/ by replacing _L-s with 0-s as if 
it was derived from the original message a by applying Zero(a, 7). 

3. cr a- URS.Sign(sA;, (P, Y Cre d, ao)) 

4. cr/ A- URS.Derive(pA), 7, (P, Ycred, do), cr)). 

5. Compute a proof of knowledge of the secret, pseudonym, and the 
correctness of the signature on a context: 

7r x , Cred A- 77.Si in Prove ( CRSx ; tdsim’, P ,% , Stmt, ext). 
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credentials issued or to generate a credential proof. Thus we assume that algo- 
rithms (SecGen, NymGen, Nym Verify) to generate user secret keys, to generate 
pseudonyms, and to verify pseudonyms are available. J~c r ed is given NymVerify 
as an input parameter and will use this algorithm, to check the relation between 
P and X. For the security properties guaranteed by Fcred, we do not make any 
assumptions about the security properties of these algorithms. However, for the 
security of the overall credential scheme, pseudonym need to be commitments 
to X, i.e., to be binding and hiding w.r.t. X. 

In the following we provide the definition of P Cred and a protocol Pfored that 
realizes P Cred using Pqa and Pcrs, assuming static corruptions. 

Single Issuer Ideal Functionality. The starting point for our credential function- 
ality is the ideal functionality of unlinkable redact able signatures, extended in a 
number of ways. Similar to Purs (and in line with other UC-functionalities such 
as P sig that need to output cryptographic values), Pcred is handed a number 
of cryptographic algorithms by the simulator. These algorithms allow pQed to 
produce cryptographic artifacts for proofs of credential ownership and attribute 
disclosure, to verify such proofs (when they are coming from the adversary), 
and to extract values from proofs. (We note that there are no artifacts for the 
credentials themselves.) While these algorithms can be completely adversarial, 
Pcred will enforce that algorithms and the artifacts produced by them) satisfy 
the required unforgeability and privacy properties. In fact, because of the privacy 
properties, Pcred needs to run these algorithms itself and cannot ask the simu- 
lator for the artifacts as is sometimes done (cf. Purs and the UC-functionalities 
for blind signatures [6,42]). 

We now describe the steps of our ideal functionality Pc re d (cf. Fig. 1) and 
explain the security properties it ensures and how it does so. Note that because 
we consider static corruption, Pc re d and STM. are aware of which parties are 
corrupted. 

Xcred maintains two tables for bookkeeping: Miss stores information about 
issued credentials and Mpres stores information about credentials that pro- 
duced presentation proofs. It then handles requests as follows. Upon receiving 
a (keygen, sid) message, Pc re d performs a setup by asking the simulator for the 
system parameters, the keys of the issuer, trapdoors, a set of algorithms and 
a list of corrupted parties. The message (leakSK, sid) is handled in exactly the 
same way as for redact able signatures. 

Next, upon receiving a (issueCred, sid, qid,U, X, P, aux(P)) message from 
a user U , Pc re d initiates credential issuing by sending a corresponding message 
to the issuer specified in sid = (Z, sid'). If T responds to the request with a list 
of attributes a, Pc re d verifies that X,P, and aux(P) form a valid pseudonym 
(i.e., NymVerify outputs 1), and, if so, records in Miss that a credential with 
attributes a to user U w.r.t. secret X has been issued. 

Upon receiving a credential proof request in the form of a (proveCred, . . .) 
message, Pc re d verifies whether the provided X,P, and aux(P) form a valid 
pseudonym and whether a credential with attributes a to user U w.r.t. secret X 
has been issued. Then, Pc re d creates a cryptographic artifact for the proof using 
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the Cred.Sim Prove algorithm where no information that must not be revealed is 
input to the algorithm. This will guarantee the privacy properties of the creden- 
tial proof for honest users. Furthermore, before outputting the proof to the user, 
T Cred will verify it using Cred. Verify as to ensure correctness. 

Finally, upon receiving the (verifyCredProof , . . .) message, ^cred has to 
determine whether or not the proof should be accepted. Here we need to deal 
with unforgeability of credential proofs (and thus of credentials) if the issuer is 
honest and its secret key has not been leaked. Naturally, Pcred should accept 
proofs that it has generated itself. Apart from that, T cred could in principle just 
accept all proofs for which the revealed attributes correspond to a credential 
that was issued. This would allow the adversary to also produce proofs that 
match credentials that were not issued to dishonest users but only to an honest 
user. To prevent this, we require an extraction algorithm Cred. Extract which, on 
input a credential proof, will generate a user secret. Then, JFcred will accept a 
credential proof only if the revealed attributes correspond to a credential that 
was issued to a corrupted users w.r.t. the X' extracted from the proof. That, 
however, would still allow (dishonest) users to mix and match their credentials. 
Therefore, Xcred will accept the proofs only if the extracted X' underlies the 
pseudonym P' w.r.t. which the proof verifies. Therefore, T c r ed checks the latter 
using NymVerify. 

Realization of Pcred • A protocol 7^cred that realizes Jeered can be obtained from 
the algorithms described in Sect. 4.1 in the (^crs, *^c a) - hybrid model where SP 
is the reference string and each call to ,7c red is replaced by running the cor- 
responding algorithms. The detailed description of 7^cred is given in the full 
version [9]. 

For efficiency reasons related to the integration of pseudonyms (which requires 
zero-knowledge proofs and thus whitebox techniques), 7^cred does not use 7 ^urs 
as a (blackbox) subroutine. We will, however, carefully align the internals of Pcred 
and 7^cred with those of P\jrs and 7 ^urs respectively, such that we can use the UC 
emulation theorem in one of the hybrid steps of our security proof. 

Theorem 7. Let URS be an unlinkable redactable signature scheme according 
to Definition 1, SPC be a structure-preserving commitment scheme, Rgap be 
a verifiable relation, 77 be a non-interactive proof of knowledge system. Then 
IZcred securely realizes JF Cred in the (Pcrs^ca) - hybrid model if URS is correct, 
unlinkable, unforgeable, and key extractable, SPC is binding, the non-interactive 
proof- of -knowledge system is zero -knowledge and simulation extractable, and the 
Rgap relation is hard. The proof is provided in the full version [9]. 

Building a Full-Fledged Credential System with Multiple Issuers. We now explain 
how to use our credential functionality to support multiple issuers using multiple 
sessions of Jeered , one for each issuer, together with algorithms (SecGen, NymGen, 
NymVerify) to generate user secret keys, to generate pseudonyms, and to verify 
pseudonyms. The pseudonyms are required to be both hiding and binding w.r.t. 
the user secret to provide privacy to the honest users and to prevent colluding 
users from sharing credentials unless they all user the same user secret. A user 
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Functionality Xcred(Nym Verify) 

The functionality maintains tables A iiss and M.pres initialized to 0 and flags kg 
and keyleak which are initially unset. 

— On input (keygen, sid) from X, verify that sid = (l,sid') for some sid' and 
that flag kg is unset. If not, then return _L. Else, do the following: 

1. Send (initF, sid) to STM. and wait for a message (initF, sid , SP , sk,pk , 
tdsim, td ext , Cred. Sim Prove, Cred. Verify, Cred. Extract) from STM, where 
SP are the system parameters, td S im and td ex t are the simulation and 
extraction trapdoors respectively, and the rest are polynomial-time 
algorithms. Store all of these values and set flag kg. 

2. Return (verif icationKey, sid, pk) to X. 

— On input (leakSK, szgT) from X verify that sid = (X, sid') for some sid'. If 
not, return X. Else, if flag kg is set, set flag keyleak and return 
(leakSK, sid , sk ), otherwise - abort. 

— On input (issueCred, sid , qid , X, P, aux(P)) from U , check sid = (X, sid') 
for some sid ' , and that flag kg is set. If not, return X. Else send a public 
delayed output (issueCred, sid , qid , P) to X. 

— On input (issueCred, sid, qid , a) from X, check for 

(issueCred, sid , qid , X , P, aux(P)) from U , and verify that szd = (X, szd') for 
some sid' and that the flag kg is set. If not, return X. Else, do the following: 

1. Run b NymVerify(SP, P, X , aux(P)). If b = 0, return X. 

2. Add (ISS,E,X,a) to Miss. 

3. Send a public delayed output (credlssued, sid , qid , a) to ZY. 

4. When (credlssued, sid, pd, a) is delivered to U , update the issuance 
record by adding the user to ( ISS,U , A, a) of Xiiss- 

— On input (proveCred, sid , X, P', aux(P )' , /, a, catf) from Zi, do the following: 

1. Check if kg is set. If not, return X. 

2. Check if NymVerify(5P, P', X, aux(P)') = 1. If not, return X. 

3. Check if ( ISS,U , X, a) exists. If not, return X. 

4. ex, C red Cred .Si m P rove(5P, sfc, td S im , P', a/, ext). 

5. Check if Cred. Verify (SP, pfc, P' ,7Vx,c re d, cli, ext) = 0, then output X. 

6. Add (PRES ext, X , P' , aux(P)' , ai,7Tx,Cred) to Mpres- 

7. Return (credProved, sid , a/, Ex,Cred) to 

— On input (verifyCredProof , sid , pk' , P ' , E' x ,credi a 'h cx O from some party 
P, do the following: 

1. Verify the proof result = Cred.Verify(SP, pfc', P', 7 v' Xj credi a 'i ? caP). 

2. If pk ^ pk ' , or keyleak is set, or X is dishonest, or result = 0, send 
(verified, szd, a/j, result) to V. 

3. Else, if there is a record (PRES, *, cap, *, P', *, a\, e' x , c re d) return 
(verified, sid, a\, 1) to V. 

4. Else, run (X', aux(P)') Cred.Extract(SP, td ext , pk, P', E X Cred , a' T , ext'). 

5. If NymVerify(SP, P' , X' , aux(P)') = 0, return (verified, sid, a' x , 0) to V. 

6. Else, if there is a record (ISS,U, X' , a) in M.iss for a corrupted user U 
such that ai = a\, return (verified, sid, a' T , 1) to V. 

7. Otherwise return (verified, sid, a' T , 0) to V. 


Fig. 1 . The ideal functionality for single issuer anonymous credentials 
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now can generate a user secret and different pseudonyms on them and then use 
multiple calls to the Tq re d instances for different issuers to get credentials on 
her pseudonyms. To compose a presentation proof that reveals attributes from 
different credentials, the user creates a pseudonym P' and uses the corresponding 
J~Cred instances to generate the required proofs with respect to this pseudonym. 
Because the pseudonym is the same in different proofs and each proof guarantees 
the same underlying secret in the credential and the pseudonym, the collection of 
these proofs together results in a single proof for multiple credentials. Each proof 
block guarantees unlinkability and unforgeability, and because the pseudonym is 
both binding and hiding this composed proof is also unforgeable and unlinkable 
with respect to other proof collections. The verification is done by querying the 
corresponding Pcred instances for verification of each particular proof part and by 
checking that the pseudonym is the same in each proof part. A formal definition 
of a full- fledged credential scheme and proof that the scheme just sketched meets 
it is left as future work. 


4.3 Instantiation and Efficiency Analysis 

To analyze the efficiency of our scheme we consider a concrete instantiation sce- 
nario. We instantiate our non-interactive construction with Groth-Sahai proofs 
[45], the structure-preserving commitment scheme of [4], and our unlinkable 
redact able signature scheme presented in Sect. 3.3. We use disjunctive proofs 
to make the proof system simulation-extractable [22], see [54] for the efficient 
instantiation with 48 group elements overhead in the XDH setting that forms 
the basis of our efficiency analysis. As a hard relation we pick the Computa- 
tional Diffie- Heilman problem. The URS scheme is instantiate with the fully 
structure-preserving signature scheme by Abe et al. [5], Groth-Sahai proofs, and 
the vector commitment scheme from Sect. 3.1. The proof of Theorem 8 follows 
from Theorems 6-7. 

Theorem 8. The credential system described above securely realizes Xcred defined 
in Sect 4.2 if the SXDH , n-RootDH , n-BSDH , q-SDH, XDLIN ) co-CDH , and DBF 
assumptions hold. Consult building blocks for definitions of assumptions. 

We refer to the full version [9] for the comparison with prior work. We stress 
that the complexity of the Prove and Verify algorithms is independent of the 
number of all attributes contained in a credential. 

The size of the credential proof is roughly 178 group elements (148 when using 
the SPS of [2] instead of FSPS). This means that the communication efficiency 
for showing a credential with respect to a pseudonym is around 11 KB (9 KB for 
SPS) at 128-bit security level, which is close to Idemix credentials [31] as the size 
of pairing groups is much smaller than the size of RSA groups and because the 
size of Idemix credential proofs is linear in the number of attributes. Besides, 
Idemix credentials do not provide such strong formal security guarantees, i.e. 
they require random oracles for non-interactive proofs and are not universally 
composable. Our non-UC scheme is comparable in efficiency with the credential 
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system of Izabachene et al. [50] that has credential proofs of around 8 KB, while 
our UC scheme has larger proof sizes. Our scheme is much less efficient than the 
scheme of [47] but their scheme relies on hash functions in their construction 
and thus does not enable efficient protocol design. 

Open Questions. We leave the construction of a scheme that achieves the same 
functionality as ours with the efficiency of [47] — perhaps using fully structure 
preserving signatures of equivalence classes — as an interesting open problem. 
Other interesting questions are exploiting the lack of opening non-malleability 
for attacks on existing constructions and efficiently basing the opening non- 
malleability property of vector commitments on a more standard cryptographic 
assumption than the n-RootDH assumption of Definition 9. 
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Abstract. We describe three contributions regarding the Soft Analyti- 
cal Side-Channel Attacks (SASCA) introduced at Asiacrypt 2014. First, 
we compare them with Algebraic Side-Channel Attacks (ASCA) in a 
noise-free simulated setting. We observe that SASCA allow more efficient 
key recoveries than ASCA, even in this context (favorable to the latter). 
Second, we describe the first working experiments of SASCA against an 
actual AES implementation. Doing so, we analyse their profiling require- 
ments, put forward the significant gains they provide over profiled Dif- 
ferential Power Analysis (DPA) in terms of number of traces needed 
for key recoveries, and discuss the specificities of such concrete attacks 
compared to simulated ones. Third, we evaluate the distance between 
SASCA and DPA enhanced with computational power to perform enu- 
meration, and show that the gap between both attacks can be quite 
reduced in this case. Therefore, our results bring interesting feedback for 
evaluation laboratories. They suggest that in several relevant scenarios 
(e.g. attacks exploiting many known plaintexts), taking a small mar- 
gin over the security level indicated by standard DPA with enumeration 
should be sufficient to prevent more elaborate attacks such as SASCA. By 
contrast, SASCA may remain the only option in more extreme scenarios 
(e.g. attacks with unknown plaintexts/ciphertexts or against leakage- 
resilient primitives). We conclude by recalling the algorithmic depen- 
dency of the latter attacks, and therefore that our conclusions are specific 
to the AES. 


1 Introduction 

State-of-the-art. Strategies to exploit side-channel leakages can be classified as 
Divide and Conquer (DC) and analytical. In the first case, the adversary recovers 
information about different bytes of (e.g.) a block cipher key independently, and 
then combines this information, e.g. via enumeration [36]. In the second case, 
she rather tries to recover the full key at once, exploiting more algorithmic 
approaches to cryptanalysis with leakage. Rephrasing Banciu et ah, one can see 
these different strategies as a tradeoff between pragmatism and elegance [2] . 

In brief, the “DC+enumeration” approach is pragmatic, i.e. it is easy to 
implement, requires little knowledge about the target implementation, and can 
take advantage of a variety of popular (profiled and non-profiled) distinguishes, 
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such as Correlation Power Analysis (CPA) [6], Mutual Information Analysis 
(MIA) [14], Linear Regression (LR) [34] or Template Attacks (TA) [8]. We will 
use the term Differential Power Analysis (DPA) to denote them all [22]. 

By contrast, analytical approaches are (more) elegant, since they theoreti- 
cally exploit all the information leaked by an implementation (vs. the leakages 
of the first and/or last rounds independently for DC attacks). As a result, these 
attacks can (theoretically) succeed in conditions where the number of measure- 
ments available to the adversary is very limited. But this elegance (and the 
power that comes with it) usually implies stronger assumptions on the target 
implementation (e.g. most of them require some type of profiling). The Algebraic 
Side-Channel Attacks (ASCA) described in [30] and further analyzed in [7,32] 
are an extreme solution in this direction. In this case, the target block cipher and 
its leakages are represented as a set of equations that are then solved (e.g. with 
a SAT solver, or Groebner bases). This typically implies a weak resistance to the 
noise that is usually observed in side-channel measurements. As a result, various 
heuristics have been suggested to better deal with errors in the information leak- 
ages, such as [24,39]. The Tolerant Algebraic Side-Channel Attacks (TASCA) 
proposed in [25,26] made one additional step in this direction, by replacing the 
solvers used in ASCA by an optimizer. But they were limited by their high mem- 
ory complexity (since they essentially deal with noise by exhaustively encoding 
the errors they may cause). More recently, two independent proposals suggested 
to design a dedicated solver specialized to byte-oriented ciphers such as the 
AES [16,27]. The latter ones were more efficient and based on smart heuris- 
tics exploiting enumeration. Eventually, Soft Analytical Side- Channel Attacks 
(SASCA) were introduced at Asiacrypt 2014 as a conceptually different way 
to exploit side-channel leakages analytically [38]. Namely, rather than encoding 
them as equations, SASCA describe an implementation and its leakages as a 
code, that one can efficiently decode using the Belief Propagation (BP) algo- 
rithm. As a result, they can directly exploit the (soft) information provided by 
profiled side-channel attacks (such as LR or TA), in an efficient manner, with 
limited memory complexity, and for multiple plaintexts. Concretely, this implies 
that they provide a natural bridge between DC attacks and analytical ones. 

Our Contribution. In view of this state-of-the-art, we consider three open 
problems regarding DC and analytical strategies in side-channel analysis. 

First, we observe that the recent work in [38] experimented SASCA in the 
context of noisy AES leakages. While this context allowed showing that SASCA 
are indeed applicable in environments where ASCA would fail, it leaves the 
question whether this comes at the cost of a lower efficiency in a noise-free 
context open. Therefore, we launched various experiments with noise-free AES 
leakages to compare ASCA and SASCA. These experiments allowed us to confirm 
that also in this context, SASCA are equally (even slightly more) efficient. 

Second, the experiments in [38] exploited simulations in order to exhibit 
the strong noise-resilience of SASCA (since the amount of noise can then be 
used as a parameter of such simulations). But this naturally eludes the question 
of the profiling of a concrete device, which can be a challenging task, and for 
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which the leakage functions of different target intermediate values may turn out 
to be quite different [13]. Therefore, we describe the first working experiments 
of SASCA against an actual AES implementation, for which a bivariate TA 
exploiting the S-box input/output leakages would typically be successful after 
more than 50 measurements. We further consider two cases for the adversary’s 
knowledge about the implementation. In the first one, she has a precise descrip- 
tion in hand (i.e. the assembly code, typically). In the second one, she only knows 
AES is running, and therefore only exploits the generic operations that one can 
assume from the algorithm specification. 1 Our experiments confirm that SASCA 
are applicable in a simple profiled scenario, and lead to successful key recoveries 
with less traces than a DC attack (by an approximate factor up to 5). They 
also allow us to discuss the profiling cost, and the consequences of the different 
leakage functions in our target implementation. A relevant observation regarding 
them is that weak leakages in the MixColumns operations are especially damag- 
ing for the adversary, which can be explained by the (factor) graph describing an 
AES implementation: indeed, XORing two values with limited information sig- 
nificantly reduces the information propagation of the BP algorithm execution. 
This suggest interesting research directions for preventing such attacks, since 
protecting the linear parts of a block cipher is usually easier /cheaper. 

Third, we note that SASCA are in general more computationally intensive 
than DC attacks. Therefore, a fair comparison should allow some enumeration 
power to the DC attacks as well. We complement our previous experimental 
attacks by considering this last scenario. That is, we compare the success rate 
of SASCA with the ones of DC attacks exploiting a computational power corre- 
sponding to up to 2 30 encryptions (which corresponds to more than the execu- 
tion time of SASCA on our computing platform). Our results put forward that 
SASCA remain the most powerful attack in this case, but with a lower gain. 

Summary. These contributions allow answering the question of our title. First, 
SASCA are in general preferable to ASCA, with both noise-free and noisy AES 
leakages. Second, the tradeoff between SASCA and DC attacks is more balanced. 
As previously mentioned, DC attacks are more pragmatic. So the interest of 
SASCA essentially depends on the success rate gains it provides, which itself 
depends on the scenarios. If multiple plaintexts/ciphertext pairs are available, 
our experiments suggest that the gain of SASCA over DPA with enumeration is 
somewhat limited, and may not justify such an elegant approach. This conclusion 
backs up the results in [2], but in a more general scenario, since we consider 
multiple-queries attacks rather than single-query ones, together with more a 
powerful analytical strategy. By contrast, if plaintexts/ciphertexts are unknown 
(which renders DPA [IT] and enumeration more challenging to apply), or if the 
number of plaintexts one can observe is very limited (e.g. by design, due to a 
leakage-resilient primitive [10]), SASCA may be the best/only option. 

1 Admittedly, such a generic scenario still assumes that the target implementation 
closely follows the specifications given in [11] which may not always be the case, e.g. 
for bitslice implementations [29], or T-table based implementations [9]. 
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Preliminary Remark. Our focus in this paper is on a couple of extreme 
approaches to side-channel analysis, i.e. the most pragmatic DC attacks against 
8-bit targets of the first AES round, and the most elegant ASCA/SASCA exploit- 
ing most/all such targets in the implementation. Quite naturally, the other ana- 
lytical attacks mentioned in this introduction would provide various tradeoffs 
between these extremes. Besides, more computationally-intensive DPA attacks 
(based on larger key hypotheses) are also possible, as recently discussed by 
Mather et al. [23]. Such attacks are complementary and may further reduce 
the gain of SASCA over DPA, possibly at the cost of increased computational 
requirements (e.g. the latter work exploited high-performance computing 
whereas all our experiments were carried out on a single desktop computer). 

2 Background 

In this section we first describe the measurement setup used in our experiments. 
Then, we describe two tools we used to identify and evaluate information leakages 
in the traces. Finally, we recall the basics of the different attacks we compare. 


2.1 Measurement Setup 

Our measurements are based on the open source AES FURIOUS implementa- 
tion (http://point-at-infinity.org/avraes) run by an 8-bit Atmel ATMEGA644p 
microcontroller at a 20 MHz clock frequency. We monitored the power consump- 
tion across a 22 i? resistor. Acquisitions were performed using a Lecroy WaveRun- 
ner HRO 66 ZI providing 8-bit samples, running at 400 Msamples/second. For 
SASCA, we can exploit any intermediate values that appear during the AES 
computation. Hence, we measured the full encryption. Our traces are composed 
of 94 000 points, containing the key scheduling and encryption rounds. Our pro- 
filing is based on 256 000 traces corresponding to random plaintexts and keys. 
As a result, we expect around 1 000 traces for each value of each intermediate 
computation. We use l l for the value x of the n th intermediate value in the i th 
leakage trace, and l l n x (t) when we access at the t th point (sample) of this trace. 


2.2 Information Detection Tools 

Since SASCA can exploit many target intermediate values, we need to identify 
the time samples that contain information about them in our traces, next referred 
to as Points Of Interest (POI). We recall two simple methods for this purpose, 
and denote the POI of the n th intermediate value in our traces with t n . 


(a) Correlation Power Analysis (CPA) [6]. is a standard side-channel dis- 
tinguisher that estimates the correlation between the measured leakages and 
some key-dependent model for a target intermediate value. In its standard ver- 
sion, an a-priori (here, Hamming weight) model is used for this purpose. 
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In practice, this estimation is performed by sampling (i.e. measuring) traces 
from a leakage variable L and a model variable M&, using Pearson’s correlation 
coefficient: 


Pk(L, M k ) 


E \{L - (tpjMk ~ AmJ] 

yvar(L)var(M fc ) 


In this equation, E and var respectively denote the sample mean and variance 
operators, and £il is the sample mean of the leakage distribution L. CPA is a 
univariate distinguisher and therefore launched sample by sample. 


(b) The Signal-to-Noise Ratio (SNR) [21]. of the n th intermediate value 
at the time sample t can be defined according to Mangard’s formula [21]: 


SNR n (t) = 


var*(E <(/;,,*(*))) 

E* (va u(ln,x(t))) 


Despite connected (high SNRs imply efficient CPA if the right model is used), 
these metrics allow slightly different intuitions. In particular, the SNR cannot 
tell apart the input and output leakages of a bijective operation (such as an 
S-box), since both intermediate values will generate useful signal. This separation 
can be achieved by CPA thanks to its a-priori leakage predictions. 


2.3 Gaussian Templates Attacks 

Gaussian TA [8] are the most popular profiled distinguisher. They assume that 
the leakages can be interpreted as the realizations of a random variable which 
generates samples according a Gaussian distribution and work in two steps. 
In a profiling phase, the adversary estimates a mean fi n , x and variance x for 
each value x of the n th intermediate computation. In practice, this is done for the 
time sample t n obtained thanks to the previously mentioned POI detection tools. 
Next, in the attack phase and for each trace Z, she can calculate the likelihood 
to observe this leakage at the time t n for each x as: 

Pr[Z(* n )|x] ~ V(/Vx,<^,J- 

In the context of standard DPA, we typically have x = p ® with p a known 
plaintext and k the target subkey. Therefore, the adversary can easily calculate 
Pr[fc*|p, l(t n )] using Bayes theorem, for each subkey candidate &*: 

Pv[k*} = ~[[Pr[k*\p,l i (t n )]. 

i 

To recover the full key, she can run a TA on each subkey independently. 
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By contrast, in the context of SASCA, we will directly insert the knowledge (i.e. 
probabilities) about any intermediate value x in the (factor) graph describing 
the implementation, and try to recover the full key at once. 

Note that our SASCA experiments consider univariate Gaussian TA whereas 
our comparisons with DPA also consider bivariate TA exploiting the S-box input 
and output leakages (i.e. the typical operations that a divide-and-conquer adver- 
sary would exploit). In the latter case, the previous means and variances just 
have to be replaced by mean vectors and covariance matrices. This choice is 
motivated by our focus on the exploitation of multiple intermediate AES com- 
putations. It could be further combined with the exploitation of more samples 
per intermerdiate computation, e.g. thanks to dimensionality reduction [1]. 


2.4 Key Enumeration and Rank Estimation 

At the end of a DC side-channel attack (as the previous TA), the attacker has 
probabilities on each sub key. If the master key is not the most probable one, 
she can perform enumeration up to some threshold thanks to enumeration algo- 
rithms, e.g. [36]. This threshold depends on the computational power of the 
adversary, since enumerating all keys is computationally impossible. If the key is 
beyond the threshold of computationally feasible enumeration, and in order to 
gain intuition about the computational security remaining after an attack, key 
rank estimation algorithms can be used [15,37]. A key rank estimation takes in 
input the list of probabilities of all subkeys and the probability of the correct 
key (which is only available in an evaluation context), and returns an estimation 
on the number of keys that are more likely than the actual key. Rank estima- 
tion allows to approximate d th - order success rates (i.e. the probability that the 
correct key lies among the d first ones rated by the attack) efficiently and quite 
accurately. The security graphs introduced in [37] provide a visual representation 
of higher-order success rates in function of the number attack traces. 

2.5 Algebraic Side-Channel Attacks 

ASCA were introduced in [30] as one of the (if not the) first method to efficiently 
exploit all the informative samples in a leakage trace. We briefly recall their three 
main steps and refer to previous publications for the details. 

1. Construction consists in representing the cipher as an instance of an algebraic 
problem (e.g. Boolean satisfiability, Groebner bases). Because of their large mem- 
ory (RAM) requirements, ASCA generally build a system corresponding to one 
(or a few) traces only. For example, the SAT representation of a single AES trace 
in [32] has approximatively 18, 000 equations in 10, 000 variables. 

2. Information extraction consists in getting exploitable leakages from the mea- 
surements. For ASCA, the main constraint is that actual solvers require hard 
information. Therefore, this phase usually translates the result of a TA into 
deterministic leakages such as the Hamming weight of the target intermediate 
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values. Note that the attack is (in principle) applicable with any type of lekages 
given that they are sufficiently informative and error-free. 

3. Solving. Eventually, the side-channel information extracted in the second 
phase is added to the system of equations constructed in the first phase, and 
generic solvers are launched to solve the system and recover the key. In practice, 
this last phase generally has large RAM requirements causing ASCA to be lim- 
ited to the exploitation of one (or two) measurement traces. 

Summarizing, ASCA are powerful attacks since they can theoretically recover a 
key from very few leakage traces, but this comes at the cost of low noise-resilience, 
which motivated various heuristic improvements listed in introduction. The next 
SASCA are a more founded solution to get rid of this limitation. 

2.6 Soft Analytical Side-Channel Attacks 

SASCA [38] describe the target block cipher implementation and its leakages 
in a way similar to a Low-Density Parity Check code (LDPC) [12]. Since the 
latter can be decoded using soft decoding algorithms, it implies that SASCA 
can directly use the posterior probabilities obtained during a TA. Similar to 
ASCA, they can also be described in three main steps. 

1. Construction. The cipher is represented as a so-called “factor graph” with 
two types of nodes and bidirectional edges. First, variable nodes represent the 
intermediate values. Second, function nodes represent the a-priori knowledge 
about the variables (e.g. the known plaintexts and leakages) and the operations 
connecting the different variables. Those nodes are connected with bidirectional 
edges that carry two types of messages (i.e. propagate the information) through 
the graph: the type q message are from variables to functions and the type r 
messages are from functions to variables (see [20] for more details). 

2. Information extraction. The description of this phase is trivial. The probabil- 
ities provided by TA on any intermediate variable of the encryption process can 
be directly exploited, and added as a function node to the factor graph. 

3. Decoding. Similar to LDPC codes, the factor graph is then decoded using the 
BP algorithm [28] . Intuitively, it essentially iterates the local propagation of the 
information about the variable nodes of the target implementation. 

Since our work is mostly focused on concrete investigations of SASCA, we now 
describe the BP algorithm in more details. Our description is largely inspired 
by the description of [20, Chapter 26]. For this purpose, we denote by X{ the i th 
intermediate value and by fi the i th function node. As just mentioned, the nodes 
will be connected by edges that carry two types of messages. The first ones go 
from a variable node to a function node, and are denoted as The second 

ones go from a function node to a variable node, and are denoted as rf n ^ Vrn . 
In both cases, n is the index of the sending node and m the index of the recipient 
node. The messages carried correspond to the scores for the different values of 
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the variable nodes. At the beginning of the algorithm execution, the messages 
from variable nodes to function nodes are initialized with no information on the 
variable. That is, for all n, m and for all x n we have: 

Qv n ^fm( X n) = 1 - 

The scores are then updated according to two rules (one per type of messages): 

r fm^v n ( x n) = ^ ^ (fm( x n' i x n)^^Qv n f^f rn ( x n')') • (1) 

x n /,n'^n n' 

( X 7l) = r fm'^V n ( X 7l) - ( 2 ) 

rn'^rn 

In Eq. 2, the variable node v n sends the product of the messages about x n 
received from the others function nodes (m' ^ m) to the function node / m , 
for each value of x n . And in Eq. 1, the function node f m sends a sum over all 
the possible input values of f m of the value of f m evaluated on the vector of 
(x n ',n' 7^ n)’s, multiplied by the product of the messages received by f m for 
the considered values of x n > . The BP algorithm essentially works by iteratively 
applying these rules on all nodes. If the factor graph is a tree (i.e. if it has no 
loop) , a convergence should occur after a number of iterations at most equal to 
the diameter of the graph. In case the graph includes loops (e.g. as in our AES 
implementation case), convergence is not guaranteed, but usually occurs after a 
number of iterations slightly larger than the graph diameter. The main parame- 
ters influencing the time and memory complexity of the BP algorithm are the 
number of possible values for each variable (i.e. 2 8 in our 8-bit example) and the 
number of edges. The time complexity additionally depends on the number of 
inputs of the function nodes representing the block cipher operations (since the 
first rule sums over all the input combinations of these operations). 

3 Comparison with ASCA 

ASCA and SASCA are both analytical attacks with very similar descriptions. 
As previously shown in [38], SASCA have a clear advantage when only noisy 
information is available. But when the information is noise-free, the advantage 
of one over the other has not been studied yet. In this section, we therefore tackle 
the question “which analytical attack is most efficient in noise-free scenario?”. 
To this end, we compare the results of SASCA and ASCA against a simu- 
lated AES implementation with noise-free (Hamming weight) leakages. We first 
describe the AES representation we used in our SASCA (which will also be used 
in the following sections), then describe the different settings we considered for 
our simulated attacks, and finally provide the results of our experiments. 

3.1 Our Representation for SASCA 

As usual in analytical attacks, our description of the AES is based on its tar- 
get implementation. This allows us to easily integrate the information obtained 
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AddRoundKey Sbox MixColumns 



Fig. 1 . Graph representation of one column of the first AES round. 


during its execution. For readability purposes, we start by illustrating the graph 
representation for the first round of one column of the AES in Fig. 1. To build 
this graph for one plaintext, we start with 32 variable nodes (circles), 16 for 
the 8-bit subplaintexts (p$), and 16 for the 8-bit subkeys (ki). We first add a 
new variable node in the graph representation each time a new intermediate 
value is computed in the AES FURIOUS implementation, 2 together with the 

2 Excluding memory copies which only increase the graph diameter. 
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corresponding function nodes (rectangles). There are three different operations 
that create intermediate values. First, the Boolean XOR takes two variables as 
inputs and outputs a new variable that is equal to the bitwise XOR of the two 
inputs. Next, two memory accesses to look-up tables are used for the S-box and 
Xtimes operations, which take one variable as input, and create a new variable 
as output. We finally add two types of leaf nodes to these three function nodes. 
The P’s reflect the knowledge of the plaintext used, and the £’s give the posterior 
probability of the value observed using Gaussian templates. A summary of the 
different function nodes used in our AES factor graph is given in Table 1. 


Table 1 . Summary of the function nodes used in our AES factor graph. 


><! 

O 

IT 

yr 

yy 

II 

F 

if a = b 0 c, 

SB0X(a, b ) = 

F 

if a — sbox(b ), 


otherwise. 

1° 

otherwise. 

XTIMES (a, b) = 


if a — xtimes(b ), 

P (x n ) = ■ 

> 

if x n - p, 

1° 

otherwise. 


otherwise. 


C(x n ) = Pr[x n |Z(t n )]. 


The graph in Fig. 1 naturally extends to a full AES execution. And when 
using several traces, we just keep a single description of the key scheduling, that 
links different subgraphs representing the different plaintext encryptions. Our 
description of the key scheduling requires 226 variable nodes and 210 function 
nodes. Our description of the rounds requires 1036 variable nodes and 1020 
function nodes. The key scheduling nodes are connected by 580 edges, and each 
round of the encryption contains 292 edges. As a result and overall, the factor 
graph for one plaintext contains 1262 variable nodes, 1230 function nodes and 
3628 edges. On the top of that we finally add the leakage function nodes which 
account for up to 1262 edges (if all leakages are exploited). Concretely, each 
variable node represents an intermediate value that can take 2 8 different values. 
Hence, if we represent each edge by two tables in single precision of size 256, the 
memory required is: 256 x (3628 x 2 + 1262) x 4 bytes « 8 MB. 


3.2 Comparison Setup 

Our noise-free evaluations of ASCA and SASCA are based on single-plaintext 
attacks, which is due to the high memory requirements of ASCA (that hardly 
extend to more plaintexts). In order to stay comparable with the previous work 
in [32], we consider a Hamming weight ( Wh ) leakage function and specify the 
location of the leakages as follows: 

- 16 Wh s for Add Round Key, 

-16 Wh s for the output of SubBytes and Shift Rows, 

- 36 Wh s for the XORs and 16 Wh for the look-up tables in MixColumns. 


3 For the leakage nodes, messages from variable to function (^ n ^/ m ) are not necessary. 
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As previously mentioned, these leakages are represented by C boxes in Fig. 1. 
We also consider two different contexts for the information extraction: 

- Consecutive weights (cw), i.e. the Wh s are obtained for consecutive rounds. 

- Random weights (rw), i.e. we assume the knowledge of Wh s for randomly 
distributed intermediate values among the 804 possible ones. 

Eventually, we analyzed attacks in a Known Plaintext (KP) and Unknown Plain- 
text (UP) scenario. And in all cases, we excluded the key scheduling leakages, as 
in [32]. Based on these settings, we evaluated the success rate in function of the 
quantity of information collected, counted in terms of “rounds of information” , 
where one round corresponds to 84 Wh s of 8-bit values. 


3.3 Experimental Results 

The results of our SASCA with noise-free leakages are reported in Fig. 2, and 
compared to the similar ASCA experiments provided in Reference [32]. 

We first observe that 2 consecutive rounds of Wh s are enough to recover 
the key for SASCA with the knowledge of plaintext and when the leakages are 
located in the first rounds. 4 Next, if we do not have access to the plaintext, 
SASCA requires 3 consecutive rounds of leakage, as for ASCA. By contrast, 
and as previously underlined, the solving/decoding phase is significantly more 
challenging in case the leakage information is randomly distributed among the 
intermediate variables. This is intuitively connected to the fact that the solver 
and decoder both require to propagate information through the rounds, and 
that this information can rapidly vanish in case some intermediate variables 
are unknown. The simplest example is a XOR operation within MixColumns, 
as mentioned in introduction. So accumulating information on closely connected 
intermediate computations is always the best approach in such analytical attacks. 
This effect is of course amplified if the leakages are located in the middle rounds 
and the plaintext /ciphertext are unknown, as clear from Fig. 2. 

Overall, and since both SAT-solvers and the BP algorithm with loops in the 
factor graph are highly heuristic tools, it is of course difficult to make strong 
statements about their respective leakage requirements. However, these experi- 
ments confirm that at least in the relevant case-study of Hamming weight AES 
leakages, the better noise-resilience of SASCA does not imply weaker perfor- 
mances in a noise- free setting. Besides, and in terms of time complexity, the 
attacks also differ. Namely, the resolution time for ASCA depends of the quan- 
tity of information, whereas it is independent of this quantity in SASCA, and 
approximately 20 times lower than the fastest resolution times for ASCA. 

Note finally that moving to a noisy scenario can only be detrimental to ASCA. 
Indeed, and as discussed in [26], ASCA requires correct hard information for the 

4 We considered leakages for the two first rounds in this case, which seems more nat- 
ural, and is the only minor differences with the experiments in [32] , which considered 
middle rounds. However, we note that by considering middle round leakages with 
known plaintext, we then require three rounds of Wh s, as for ASCA. 
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Fig. 2. Experimental results of comparison of ASCA and SASCA. 


key recovery to succeed. In case of noisy measurements, this can only be guar- 
anteed by considering less informative classes of leakages or similar heuristics. 
For example, previous works in this direction considered Hamming weights h’s 
between h — d and h + d for increasing distances d’s, which rapidly makes the 
attack computationally hard (and cannot be mitigated with multiple plaintext 
leakages because of the high RAM requirements of ASCA). So the efficiency gain 
of SASCA over ASCA generally increases with the measurement noise. 

4 SASCA Against a Concrete AES Implementation 

In this section, we complete the previous simulated experiments and explore 
whether SASCA can be transposed in the more realistic context of measured 
leakages. To the best of our knowledge, we describe the first uses of SASCA 
against a concrete AES implementation, and take advantage of this case-study 
to answer several questions such as (i) how to perform the profiling of the many 
target intermediate values in SASCA?, (ii) what happens when the implementa- 
tion details (such as the source code) are unknown?, and (in) are there significant 
differences (or even gaps) between concrete and simulated experiments? 


4.1 Profiling Step 

We first describe how to exploit the tools from Sect. 2.2 in order to detect POIs 
for our 1230 target intermediate values (which correspond to 1262 variable nodes 
minus 32 corresponding to the 16 bytes of plaintext and ciphertext). In this con- 
text, directly computing the SNRs or CPAs in parallel for all our samples turns 
out to be difficult. Indeed, the memory requirements to compute the mean trace 
of an intermediate value with simple precision requires 94,000 (samples) x 256 
(values) x 4 (bytes) « 91MB, which means approximately 100 GB for the 1,230 
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values. For similar reasons, computing all these SNRs or CPAs sequentially is 
not possible (i.e. would require too much time). So the natural option is to trade 
time and memory by cutting the traces in a number of pieces that fit in RAM. 
This is easily done if we can assume some knowledge about the implementation 
(which we did) , resulting in a relatively easy profiling step carried out in a dozen 
of hours on a single desktop computer. A similar profiling could be performed 
without implementation knowledge, by iteratively testing the intermediate val- 
ues that appear sequentially in an AES implementation. 

A typical outcome of this profiling is given in Fig. 3, where we show the 
SNR we observed for the intermediate value t\ from the factor graph in Fig. 1 
(i.e. the value of the bitwise XOR of the first subkey and the first subplaintext). 
As intuitively expected, we can identify significant leakages at three different 
times. The first one, at t — 20, 779, corresponds to the computation of the value 
£i, i.e. the XOR between pi and k\. The second one, at t — 22,077, corresponds 
to the computation of the value s i, i.e. a memory access to the look-up table 
of the S-box. The third one, at t = 24,004, corresponds to memory copies of 
si during the computation of MixColumns. Indeed, the SNR cannot tell apart 
intermediate values that are bijectively related. So we used the CPA distinguisher 
to get rid of this limitation (taking advantage of the fact that a simple Hamming 
weight leakage model was applicable against our target implementation). 
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Fig. 3. SNR-based profiling of a single intermediate value. 


A summary of the results obtained after our profiling step is given in Table 2, 
where the most interesting observation is that the informativeness of the leakage 
samples strongly depends on the target intermediate values. In particular, we 
see that memory accesses allow SNRs over 2, while XOR operations lead to 
SNRs below 0.4 (and this SNR is further reduced in case of consecutive XOR 
operations). This is in strong contrast, with the simulated cases (in the previous 
section and in [38]), where all the variables were assumed to leak with the same 
SNR. Note that the table mentions both SNR and CPA values, though our 
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Table 2. Summary of profiling step results. 


Assembly code 

Graph description 

SNR 

p(W H ) 

Add round key 

Id HI, Y+ 

* 

* 

* 

eor ST11 , HI 

_Xor tl pi kl 

0.1493 

0.5186 


Sbox 


ldi ZH, high(sbox<<l) 

* 

* 

* 

mov ZL, ST11 

* 

* 

* 

1pm ST11 , Z 

_Sbox si tl 

1.6301 

0.4766 


MixColumns 


ldi ZH, high(xtime<<l) 

* 

* 

* 

mov HI, ST11 

* 

* 

* 

eor HI, ST21 

_Xor hi si s2 

0.1261 

0.6158 

eor HI, ST31 

_Xor h2 hi s3 

0.0391 

0.1449 

eor HI, ST41 

_Xor h3 h2 s4 

0.3293 

0.5261 

mov H2 , ST11 

* 

* 

* 

mov H3 , ST11 

* 

* 

* 

eor H3 , ST21 

_Xor mcl si s2 

0.2802 

0.6163 

mov ZL , H3 

* 

* 

* 

1pm H3 , Z 

_Xtime xtl mcl 

2.8650 

0.6199 

eor ST11 , H3 

_Xor cml xtl si 

0.0723 

0.2508 

eor ST11 , HI 

_Xor pl7 cml h3 

0.1064 

0.3492 


Key schedule 


ldi HI, 1 

* 

* 

* 

ldi ZH, high(sbox<<l) 

* 

* 

* 

mov ZL, ST24 

* 

* 

* 

1pm H3 , Z 

_Sbox skl4 kl4 

2.2216 

0.5553 

eor ST11 , H3 

_Xor akl skl4 kl 

0.1158 

0.5291 

eor ST11 , HI 

_XorCste kl7 akl 1 

0.3435 

0.5140 


selection of POIs was based on the (more generic) first criteria, and CPA was 
only used to separate the POIs of bijectively related intermediate values. 5 


4.2 Experimental Results 

Taking advantage of the previous POI detection, we now want to discuss the 
consequences of different assumptions about the implementation knowledge. These 
investigations are motivated by the usual gap between Kerckhoff ’s laws [18] , which 


5 We used a relatively noisy setup on purpose (e.g. we did not filter our measurements), 
in order to magnify the effectiveness of SASCA in such challenging contexts. 
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advises to keep the key as only secret in cryptography, and the practice in embedded 
security, that usually takes advantage of some obscurity regarding the implemen- 
tations. For this purpose, we considered three adversaries: 

1. Informed. The adversary has access to the implementation details (i.e. source 
code), and can exploit the leakages of all the target intermediate values. 

2. Informed, but excluding the key scheduling. This is the same case as the pre- 
vious one, but we exclude the key scheduling leakages as in the simulations 
of the previous section (e.g. because round keys are precomputed). 

3. Uninformed. Here the adversary only knows the AES is running, assumes it 
is implemented following the specifications in [11], and only exploits generic 
operations (i.e. the inputs and outputs of AddRoundKey, SubByte, ShiftRows 
and MixColumns, together with the key rounds’ inputs and outputs). 

In order to have fair comparisons, we used the same profiling for all three cases 
(i.e. we just excluded some POIs for cases 2 and 3), and we used 100 sets of 30 
traces with different keys and plaintexts to calculate the success rate of SASCA 
in these different conditions. The results of our experiments are in Fig. 4. Our first 
and main observation is that SASCA are applicable to actual implementations, 
for which the leakages observed provide more or less information (and SNR) 
depending on the intermediate values. As expected, the informed adversary is 
the most powerful. But we also see that excluding the key scheduling leakages, or 
considering an uninformed adversary, only marginally reduces the attack success 
rates. Interestingly, there is a strong correlation between this success rate and the 
number of leakage samples exploited, since excluding the key scheduling implies 
the removal of 226 leakage function nodes, and the uninformed adversary has 
540 leakage function nodes less than the informed one (mostly corresponding 
to the MixColumns operation). So we can conclude that SASCA are not only 
a threat for highly informed adversaries, and in fact quite generically apply to 
unprotected software implementations with many leaking points. 


Simulation Vs. Measurement. In view of the previous results, with infor- 
mation leakages depending on the target intermediate values, a natural question 
is whether security against SASCA was reasonably predicted with a simulated 
analysis. Of course, we know that in general, analytical attacks are much harder 
to predict than DPA [31], and do not enjoy simple formulas for the prediction of 
their success rates [22]. Yet, we would like to study informally the possible con- 
nection between simple simulated analyses and concrete ones. For this purpose, 
we compare the results obtained in these two cases in Fig. 5. For readability, we 
only report results for the informed and uninformed cases, and consider different 
SNRs for the simulated attacks. In this context, we first recall Table 2 where the 
SNRs observed for our AES implementation vary between 2 1 and 2 -2 . Interest- 
ingly, we see from Fig. 5 that the experimental success rate is indeed bounded by 
these extremes. (Tighter and more rigorous bounds are probably hard to obtain 
for such heuristic attacks). Besides, we also observe that the success rates of the 
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Fig. 4. Success rate in function of the # of traces for different adversaries: informed one 
( * ), informed one without key scheduling leakages and uninformed one (" ). 

measurements and simulations are closer in the case of the uninformed adver- 
sary, which can be explained by the fact that we essentially ignore MixColumns 
leakages in this case, for which the SNRs are lower. 

5 Comparison with DPA and Enumeration 

In this section, we start from the observation that elegant approaches to side- 
channel analysis generally require more computational power than standard 
DPA. Thus, a fair comparison between both approaches should not only look at 
the success rate in function of the number of traces, but also take into account 
the resolution time as a parameter. As a result, and in order to compare SASCA 
and the pragmatic DPA on a sound basis, this section investigates the result of 
DC attacks combined with computational power for key enumeration. 


5.1 Evaluation of Profiled Template Attacks 

In order to be as comparable as possible with the previous SASCA, our com- 
parison will be based on the profiled TA described in Sect. 2. 3. 6 More precisely, 
we considered a quite pragmatic DC attack exploiting the bivariate leakages 
corresponding to the Add Round Key and SubByte operations (i.e. an d 

i in Fig. 1). We can take advantage of the same detection of POIs as 
described in the previous section for this purpose. This choice allows us to keep 
the computational complexity of the TA itself very minimal (since relying only 
on 8-bit hypotheses). As previously mentioned, it also aims to make comparison 

6 We considered TA for our DPA comparison because they share the same profiled 
setting as SASCA. Comparisons with a non-prohled CPA can only be beneficial to 
SASCA. More precisely, we expect a typical loss factor of 2 to 5 between (Wh -based) 
CPA and TA, according to the results in [35] obtained on the same device. 
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Fig. 5. Experimental results for SASCA for an informed adversary (a) and unin- 
formed adversary (b). Red curves are for simulated cases ( x > I ° ) for 

SNR (2 1 , 2 _1 , 2 -2 , 2 -3 ). Blue curves ( • ) are for experiments on real traces (Color 
figure online). 

as meaningful as possible (since we compare two attacks with one sample per 
target operation that only differ by their number of target operations). Follow- 
ing, we built the security graph of our bivariate TA, as represented in Fig. 6, 
where the white (resp. black) curve corresponds to the maximum (resp. mini- 
mum) rank observed, and the red curve is for the average rank. It indicates that 
approximately 60 plaintexts are required to recover the key without any enu- 
meration (which is in line with Footnote 5). But more interestingly, the graph 
also highlights that allowing enumeration up to ranks (e.g.) 2 30 allows to reduce 
the required number of measured traces down to approximately 10. 


5.2 Comparing SASCA and DPA with Enumeration 

In our prototype implementation running on a desktop computer, SASCA 
requires roughly one second per plaintext, and reaches a success rate of one after 
20 plaintexts (for the informed adversary). In order to allow reasonably fair com- 
parisons, we first measured that the same desktop computer can perform a bit 
more than 2 20 AES encryptions in 20 seconds. So this is typically the amount of 
enumeration that we should grant the bivariate TA for comparisons with SASCA. 7 
For completeness, we also considered the success rates of bivariate TA without enu- 
meration and with 2 30 enumeration power. 8 The results of these last experiments 

7 We omit to take the (time and memory) resources required for the generation of the 
list of the most probable keys to enumerate into account in our comparisons, since 
these resources remain small in the total enumeration cost. Using the state-of-the-art 
enumeration algorithm [36], we required 2.7MB + 0.55 seconds to generate a list of 
2 20 keys, and 1.8GB + 3130 seconds to generate a list of 2 32 keys. 

8 Which is also more than allowed by the new suboptimal key enumeration in [3] . 
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number of traces 

Fig. 6. Security graph of a bivariate TA. 

are in Fig. 7. Overall, they bring an interesting counterpart to our previous inves- 
tigations. On the one hand, we see that SASCA remains the most powerful attack 
when the adversary has enough knowledge of the implementation. By contrast in 
the uninformed case, the gain over the pragmatic TA with enumeration is lower. 
So as expected, it is really the amount and type of leakage samples exploitable 
by the adversary that make SASCA more or less powerful, and determine their 
interest (or lack thereof) compared to DC attacks. In this respect, a meaningful 
observation is that the gap between SASCA and DPA without enumeration (here 
approximately 5) is lower than the approximate factor 10 that was observed in 
the previous simulations of [38] . This difference is mainly due to the lower SNRs 
observed in the MixColumns transform. 

Eventually, we note that in view of these results, another natural approach 
would be to use enumeration for SASCA. Unfortunately, our experiments have 
shown that enumeration is much less effective in the context of analytical attacks. 
This is essentially caused by the fact that DC attacks consider key bytes inde- 
pendently, whereas SASCA decode the full key at once, which implies that the 
subkey probabilities are not independent in this case, and can be degraded when 
running the loopy BP too long. Possible tracks to improve this issue include the 
use of list decoding algorithms for LDPC codes (as already mentioned in [13]), or 
enumeration algorithms that can better take subkey dependencies into account 
(as suggested in [19] for elliptic curve implementations). 

6 Conclusion and Open Problems 

This paper puts forward that the technicalities involved in elaborate analytical 
side-channel attacks, such as the recent SASCA, are possible to solve in prac- 
tice. In particular, our results show that the intensive profiling of many target 
intermediate values within an implementation is achievable with the same (SNR 
&CPA) tools as any profiled attack (such as the bivariate TA we considered). 
This profiling only requires a dozen of hours to complete, and then enables very 
efficient SASCA that recover the key of our AES implementation in a couple 
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number of traces 

Fig. 7 . Comparison between elegant and pragmatic approaches. 

of seconds and traces, using a single desktop computer. Furthermore, these suc- 
cessful attacks are even possible in a context where limited knowledge about the 
target implementation is available, hence mitigating previous intuitions regard- 
ing analytical attacks being “only theoretical” . Besides this positive conclusion, 
a fair comparison with DC attacks also highlights that the gap between a bivari- 
ate TA and a SASCA can be quite reduced in case enumeration power is granted 
to the DC adversary, and several known plaintexts are available. Intuitively, the 
important observation in this respect is that the advantage of SASCA really 
depends on the amount and type of intermediate values leaking information, 
which highly depends on the algorithms and implementations analyzed. 

The latter observation suggests two interesting directions for further research. 
On the one hand, the AES Rijndael is probably among the most challenging tar- 
gets for SASCA. Indeed, it includes a strong linear diffusion layer, with many 
XOR operations through which the information propagation is rapidly amor- 
tized. Besides, it also relies on a non-trivial key scheduling, which prevents the 
direct combination of information leaked from multiple rounds. So it is not 
impossible that the gap between SASCA and standard DPA could be larger 
for other ciphers (e.g. with permutation based diffusion layers [4], and very min- 
imum key scheduling algorithms [5]). On the other hand, since the propagation 
of the leakage information through the MixColumns operation is hard(er), one 
natural solution to protect the AES against such attacks would be to enforce 
good countermeasures for this part of the cipher, which would guarantee that 
SASCA do not exploit more information than the one of a single round. Ideally, 
and if one can prevent any information propagation beyond the cipher rounds, 
we would then have a formal guarantee that SASCA is equivalent to DPA. 
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Abstract. Side channels provide additional information to skilled 
adversaries that reduce the effort to determine an unknown key. If suffi- 
cient side channel information is available, identification of the secret key 
can even become trivial. However, if not enough side information is avail- 
able, some effort is still required to find the key in the key space (which 
now has reduced entropy). To understand the security implications of 
side channel attacks it is then crucial to evaluate this remaining effort 
in a meaningful manner. Quantifying this effort can be done by looking 
at two key questions: first, how ‘deep’ (at most) is the unknown key in 
the remaining key space, and second, how ‘expensive’ is it to enumerate 
keys up to a certain depth? 

We provide results for these two challenges. Firstly, we show how to 
construct an extremely efficient algorithm that accurately computes the 
rank of a (known) key in the fist of all keys, when ordered according to 
some side channel attack scores. Secondly, we show how our approach 
can be tweaked such that it can be also utilised to enumerate the most 
likely keys in a parallel fashion. We are hence the first to demonstrate 
that a smart and parallel key enumeration algorithm exists. 


Keywords: Key enumeration • Key rank • Side channels 


1 Introduction 

Side channel attacks have proven to be a hugely popular research topic, as the 
proliferation of new venues such as CHES, COSADE and HOST shows. Much of 
the published research is about key recovery attacks utilising side channel infor- 
mation. Key recovery attacks essentially take a number of side channel observa- 
tions, colloquially referred to as ‘traces’, and apply a so-called distinguisher to 
traces that assigns scores to keys. An attack is considered (first-order) successful 
given a set of traces, if the actual secret key receives the highest score. Besides 
describing methods (i.e. the distinguishes) that recover the secret key from the 
available data, papers focus on the question of how many traces are required for 
successful first-order attacks. 
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The trade-off chosen in most work is, hence, to increase the number of traces 
to ensure that the secret key is recovered successfully with almost certainty. As 
observed by Veyrat-Charvillon et al. [13] in their seminal paper on optimal key 
enumeration, this might not be the trade-off that a well resourced adversary 
would choose. Suppose that access to the side channel is scarce or difficult. In 
such a case the actual secret key might not have the highest score after utilising 
the leakage traces, but it might still have a higher score than many other keys. 
Now imagine that the adversary can utilise substantial computational resources. 
This implies that by searching through the key space (in order of decreasing 
scores; we call this smart key enumeration) the adversary would find the secret 
key much faster than by a naive brute-force search (i.e. one that treats all keys 
as equally likely). Consequently, the true security level of an implementation 
cannot be judged solely by its security against first-order side channel attacks. 
Instead it is important to understand how the number of traces impacts on the 
effort required for a smart key enumeration. 

We now illustrate this motivation by linking it to evaluating the impact of 
the most influential type of side channel attack: Differential Power Analysis. 

1.1 Evaluating Resistance Against Differential Power Analysis 

Differential Power Analysis (DPA) [9] consists of predicting a so-called target 
function, e.g. the output of the Substitution Boxes, and mapping the output of 
this function to ‘predicted side channel values’ using a power model. For this 
process it is not necessary to know or guess the whole secret key, SK. One 
only needs to make guesses about ‘enough bits’. The predicted values for a key 
chunk are then ‘compared’ to the real traces (point-wise) using a distinguisher. 
Assuming enough traces are available, the value that represents the correct key 
guess will lead to a ‘higher’ distinguishing value. In Kocher et aV s original 
paper [9] this was illustrated for the DES cipher, but most contemporary research 
uses AES as running example. 

With respect to AES: Kocher ’s attack consists of using a t-test statistic as a 
distinguisher to compute scores for the values of each 8-bit chunk of the 128-bit 
key; see Fig. 1 for a visual example. Here we have m = 16 chunks, each contain- 
ing n = 256 values, with associated distinguishing scores as derived via a t-test 
statistic. In the graphical illustration, the secret key values are marked out in grey. 

If sufficient side information is available, the values of the chunks that corre- 
spond to the secret key will have by far the highest distinguishing scores, such 
as the majority of key chunks in our graphical illustration. In this case the secret 
key can be trivially found (it is the concatenation of the chunks that lead to the 
uniquely highest score). However, if less side information is available, the scores 
may not necessarily favour a single key. Nevertheless, an adversary is still able to 
utilise these scores to smartly enumerate and then test keys (by using a known 
plaintext-ciphertext pair) . 
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Fig. 1 . Score vectors for m key chunks. Each chunk can take values from 0 to n— 1, and 
scores dij are on a scale that depends on the side channel distinguisher. The values 
that correspond to the (hypothetical) secret key are highlighted in grey. 

Security Evaluations. Considering the perspective of a security evaluator, it is 
obviously important to characterise the remaining security of an implementation 
after leakage. The evaluator (who can run experiments with a known key, and 
a varying number of traces) wants to compute its position in a ranked list of all 
keys. Knowing this position allows the evaluator to assess the amount of effort 
required by an adversary performing a smart search (given some distinguishing 
vectors). Ideally, the evaluator is able to compute the ranks of arbitrarily deep 
keys. 

Accuracy and Efficiency are Key Requirements: Naturally, because the evalua- 
tor performs concrete statistical experiments, a single run of a single attack is 
not sufficient to gather sound evidence. In practice, any side channel experiment 
needs to be repeated multiple times by the evaluation lab, and many different 
attacks need to be carried out, utilising different amounts of side channel traces. 
Having the capability to determine the position of the key in a ranked list accu- 
rately (rather than just giving an estimation), and efficiently, is crucial to cor- 
rectly assess the effort of a real world adversary. Previous works’ algorithms [6, 14] 
were capable of estimating the key rank within some bound. We demonstrate 
that we are accurate when enough precision is used, and importantly, we put 
forward the first approach for parallel and smart key enumeration. 


1.2 Problem Statement and Notation 

We use a bold type face to denote multi-dimensional entities. Indices in super- 
script refer to column vectors (we use the variable j for this purpose), and 
indices in subscript refer to row vectors (we use i for this purpose) . Two indices 
i, j refer to an element in row i and column j. To maintain an elegant layout, 
we sometimes typeset column vectors ‘in line’, and then indicate transposition 
via a superscript k = (. . . ) T . 

We partition a key guess k into m chunks, each able to take one of n possible val- 
ues, i.e.k = (fc°, . . . , and AP = (dojffiij , . . . , d n _ ij) T . After exploiting 

some leakage L all chunks AP have some corresponding score vectors, i.e. we know 
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the score for each guess kij is dij after leakage. For convenience we use the vari- 
able skj to refer to the indices (in each chunk) that correspond to the correct key, 
i.e. SK = (kskx, i, &s/c 2 ,2> • • • , The score D of the secret key is then 

D = E7=o dgkjj • We will later map scores to (integer) weights and the weight of 
the secret key will be W. 

The rank of a key is defined as the position of the key in the ordering (of all 
keys), where keys with the exact same weight are ranked ‘ex aequo’. In principle, 
any order of these equally ranked keys is permissible, so one is free to make a 
choice about this order. Assuming the correct key is ranked first among all keys of 
the same weight requires us to count all keys with weight less than W. It implies 
that the rank we return is conservative in the following sense: key ranking is used 
to evaluate the security of side-channel attacks; our assumption on the ordering 
implies we give a side-channel adversary the benefit of the doubt (and so we 
deem it slightly more successful than it in reality can be). As an alternative, 
one could assume the correct key is ranked last among all keys of the same 
weight (since we use integer weights, this can be done by increasing the weight 
by one, counting all keys according to the ranked-first method, and subtracting 
one from the returned rank) ; ranking the candidate key both as first and last of 
its weight will lead to an interval of ranks containing all keys of that rank. Thus 
our choice (rank first) is effectively without loss of generality: run once it gives a 
conservative estimate, run twice it gives the exact interval of possible ranks for 
the candidate key. 

Definition 1 (Key Rank Computation). Given m vectors of n distinguish- 
ing scores, and the score D of the secret key SK, count the number of keys with 
score strictly larger than D . 

Definition 2 (Smart Key Enumeration). Given m vectors of n distinguish- 
ing scores, list the B keys with the highest score. 


1.3 Outline and Our Contributions 

In a nutshell, we utilise an elegant mapping of the key rank computation problem 
to a knapsack problem, which can be simplified and expressed as (efficient) path 
counting. As a result, we can compute accurate key ranks, and importantly, 
this enables us to put forward the first algorithm that can perform smart key 
enumeration in a parallel manner. 

Our contribution is structured in four main sections as follows: 

Casting the Key Enumeration as an Integer Knapsack. In Sect. 2 we 
show how to cast the key enumeration problem as a solution to counting 
knapsack solutions. In particular, we develop the representation of key rank 
as a multi-dimensional knapsack, and discuss its resulting graph representa- 
tion. Whilst the final definition can be represented as an integer program- 
ming problem, we choose to frame each step as an extension of the knapsack 
problem, for intuition. 
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A Key Rank Algorithm. In Sect. 3 we map the multi-dimensional knap- 
sack to a directed acyclic graph. We can therefore count solutions to the 
multi-dimensional knapsack problem by counting paths in the graph. The 
restriction of picking one item per chunk keeps the number of vertices in 
the directed acyclic graph small. As the graph is compact, and each node 
has at most two outgoing edges, the path counting problem can be solved 
efficiently in O (m 2 • n W • log n ) . 

Smart Key Enumeration. In Sect. 4, with the additional book-keeping of 
storing the vertices we visit, we can enumerate the B most likely keys with 
complexity O (m 2 • n • W • B • log n ) . We then show several techniques to 
make this process as efficient as possible. 

Practical Evaluation and Comparison with Previous Work. In Sect. 5 
we discuss requirements around precision. The main factor that influences 
performance is the size of the key rank graph, which is determined by the 
precision of the initial mapping and the weight of the target key. We compare 
our work with previous works in terms of precision and speed with regards 
to the key rank algorithm, and in terms of speed with regards to smart key 
enumeration. 

A full version of this paper can be found on ePrint 1 , where we consider 
additional alternative topological sorting methods and provide pseduo-code for 
each. Also implementation details, and testing methodologies are considered in 
greater depth. 


1.4 Previous Work 

Key Rank. An naive approach is that by simply removing a number of the 
least likely key values from each key chunk, the size of the search space is then 
restricted as n is reduced. However there are inherit problems with the approach; 
firstly this may be removing valid high ranking keys, as it is possible that a key 
may be constructed from one very low ranked value in one key chunk, and very 
high in others. Secondly, it is still reliant on a simple brute force approach, and 
even with a reduced n value this approach is thus too expensive to be practical. 
Finally, if the target key is deep, this approach won’t work at all as it is possible 
that the correct key values have been removed. 

Veyrat-Charvillon et al. [14] were the first to demonstrate an algorithm to 
estimate the rank of the key without using full key enumeration. The search 
space can be represented as a multidimensional space, with each dimension cor- 
responding to a key chunk, sorted by decreasing likelihoods. The space can be 
divided into two, those keys ranked above the target and those ranked below. 
Using the property that the ‘frontier’ between these two spaces is convex, they 
are able to ‘trim’ each space down until the key rank has been estimated to 
within 10 bits of accuracy. 

Bernstein et al. [1] propose two key ranking algorithms. The first is based 
on [14] and adds a post processing phase which has been shown to tighten the 

1 ePrint report: 2015/689. 
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bounds from 10 bits to 5 bits. The second algorithm ranks keys using techniques 
similar to those used to count all ^/-smooth numbers less than x. By having an 
accuracy parameter they are able to get their bounds arbitrarily tight (at the 
expense of run time). 

Glowacz et al. [6] construct a novel rank estimation algorithm using convolu- 
tion of histograms. Using the property that S 1 +S 2 := {xi+X 2 \x± E Si,%2 E £ 2 } 
can be approximated by histogram convolution, by creating a histogram per key 
chunk and convoluting them all together, they are able to efficiently estimate 
the key rank to within 1 bit of precision. 

Due et al. [4] perform key rank using a method inspired by Glowacz et al. [6] . 
They repeatedly ‘merge’ the data in one column at a time (as the histograms 
were convoluted in one at a time). Each piece of information is downsampled to 
one of a series of discrete values (similar to putting into a histogram bin). The 
major difference is that instead of just downsampling the orginial data, they also 
downsample after each key chunk is merged in. 


Key Enumeration. Vey rat- Char villon et al. [13] propose a deterministic algo- 
rithm to enumerate keys based on a divide-and-conquer approach. Using a tree- 
like recursion (starting with two subkeys, then four, all the way to sixteen) and 
keeping track of what they call the frontier set (similarities can be drawn to the 
frontier of Veyrat-Charvillon et al. [14]), they are able to efficiently enumerate 
keys. 

Ye et al. [15] present what they describe as a Key Space Finding algorithm. A 
Key Space Finding algorithm takes in the distinguishing score vector and returns 
two outputs: the minimum verification complexity to ensure a desired success 
probability, along with the optimal effort distributor which achieves this lower 
bound. Given this it is straightforward to run a (probabilistic) key enumeration 
algorithm. The distinguisher intuitively moves the boundary of which subkeys 
to enumerate until the desired probability is achieved. 

Bogdanov et al. [2] create a score based key enumeration algorithm which 
can be seen as a variation of depth first search. Potential keys are generated via 
score paths , each of which has a score associated with them, which in conjunction 
with a precomputed score table allows for efficient pruning of impossible paths. 
From here it is possible to efficiently enumerate possible values. 


Reflecting on the Approaches Taken by Previous Work. All of the previ- 
ous work has treated key rank and key enumeration as two disjoint problems and 
hence approached them using different techniques. For instance, it is unclear how 
to extend the existing key rank algorithms to enumerate keys, and conversely, it is 
not apparent how to simplify the enumeration algorithms to compute key ranks 
efficiently (i.e. without just counting as you enumerate). We however believe 
that both of these problems are highly similar in nature and by maintaining 
some structure within the key rank it should be possible to enumerate without 
making the ranking inefficient. In the remainder of the paper we explain how to 
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do just that: efficient key ranking with enough structure to make (fully parallel) 
enumeration possible. 

2 Casting the Key Enumeration as a Knapsack 

We now explain how the key enumeration problem can be formulated as a variant 
of a knapsack problem. In its most basic form a knapsack problem takes a set 
of n items that have a profit pi and a weight Wi. A binary variable Xi is used 
to select items from the set. The objective is then to select items such that the 
profit is maximised whilst the total weight of the items does not exceed a set 
maximum W: 

n— it 

maximize: E Pi • 

i= 0 

n— 1 

subject to: ^ Wi • Xi < W 

i = 0 

Xi £ {0, 1}, Vi 

The counting knapsack (^knapsack) problem is then understood to be the 
associated counting problem: given a knapsack definition, count how many solu- 
tions there are to the knapsack problem. 

Intuitively, we should be able to frame the key rank computation problem as 
a knapsack variant. In contrast to a basic knapsack, however, we have classes of 
items (these are the distinguishing vectors AV), profits can be dropped since we 
are counting the number of solutions, and we must take exactly one item from 
each class. The weight Wij for each item can be derived 2 from the distinguishing 
score Wij = MapToW eight ( dij ) in such a way that higher distinguishing scores 
lead to lower weights 3 . We define the maximum weight W as the sum of the 
weights associated with the secret key chunks, i.e. W = 1 w skj,j- Recall 

that we assume if several keys have weight W the secret key (which must be 
among those) is listed first. This enforces W as a strict upper bound in the 
knapsack definition. 

The multiple-choice knapsack problem that identifies keys with weight lower 
than W is then defined as follows: 

m— 1 n— 1 

E E Wi ’j • Xi ’j < w 

j= 0 i= 0 

n— 1 

E Xi ’j = 1 ’ v ^’ 

i= 0 

Xij e {0,i},Vi,i 

2 For the sake of readability, we do not discuss the implications of needing to map 
distinguishing scores (which are floating point values) to weights at this point, but 
refer the reader to Sect. 5.1 for a discussion. 

3 This ensures compatibility with knapsack notation. 
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The first constraint ensures that all keys (i.e. selections of items) have a 
weight lower than the secret key. The second constraint ensures that only one 
item per distinguishing vector is selected. The counting version of this multiple- 
choice knapsack equals to computing the key rank. 

Counting solutions to knapsack problems in general is known to be a compu- 
tationally hard problem, and known classical solutions [5] rely on combinations of 
dynamic programming and rejection sampling to construct an FPRAS. Gopalan 
et al. [7] more recently utilise branching programs for efficient counting, and 
we took inspiration from this paper to approach the solution to our counting 
problem. 

To illustrate our solution, we have to slightly modify the knapsack repre- 
sentation. It will be convenient to express the multiple-choice knapsack as a 
multi-dimensional knapsack variation as follows. Each key chunk corresponds to 
‘one dimension’. Each item kij has an associated weight vector Wij of length 
ra + 1 of the form (wij, 0, . . . , 1, 0, . . . , 0), where the 1 is in position j. The 
global weight is also expressed as a vector W = (W, 2, . . . , 2) of length m + 1 . 
The key rank problem is then to count the number of solutions (that satisfy all 
constraints simultaneously) to 


m— 1 n—1 

EE Wi,j ■ Xi,j < w 

j = 0 2 = 0 

Xij e { 0 ,i},v»,j 

The constraint W ensures that all keys that are counted have a strictly lower 
weight than the secret key. If the weight vector has a 1 in position j, it means 
that this is a value for the j th key chunk. Since the weight limit is 2 in the 
constraint vector PE, it means that only a single value for any key chunk can be 
chosen. We now illustrate this by a simple example. 

Example 1. Our illustrative example, which will run throughout the paper, con- 
sists of two distinguishing vectors with three elements each: fe° = (0, 1,3) T , and 
k 1 = (0,2,3) T . We assume that the secret key, SK, is (2,1). First we derive 
the global weight constraint vector. In this case it has length m + 1 = 3 and 
contains the maximum weight W = wo^ + wi,i = 3 + 2 = 5, which results in 
W = (5, 2, 2). The weight vectors of the kij are: 

Wo,o = (0, 1, 0), w 0 ,i = (1, 1, 0), w 0 , 2 = (3, 1, 0) 

Wi , 0 = (0, 0, 1), Wqi = (2, 0, 1), Wi,2 = (3, 0, 1) 

Given that W = 5, all except two of the combinations are below this thresh- 
old. Hence the solutions to the knapsack are: 

(fco,0?fco,l)j (fco,0?fcl,l)j (^0,Cb ^2,l) j 
(fcl,Ojfco,l)j (fcl,0jfcl,l)j (&l,0j &2, i)? 

(^2,0 5 fco,l) 
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Notice that the knapsack solution will never contain the secret key itself, as 
it returns all keys with weight strictly less than the weight of the secret key. For 
the ranking problem this would give us that our secret key has rank 8. 

For standard knapsack problems it is well known [3] that solutions can be 
found via finding longest paths on a directed acyclic graph. In the following 
section we will show that such a graph exists also for our knapsack, and impor- 
tantly, that the resulting graph allows for a particularly efficient path counting 
solution, which gives us a solution to the Key Rank problem. 

3 An Accurate Key Rank Algorithm 

In this section we first define the graph and illustrate how it relates to the multi- 
dimensional knapsack via intuition and a working example. We then explain our 
fast path counting algorithm for a compact representation of the graph. 


3.1 Key Rank Graph 

Recall that our multi-dimensional knapsack has m • n elements, and for each 
element we have a weight vector. Also, a correct solution to the multi-dimensional 
knapsack must have a weight that is strictly smaller than W . Since we need to 
be able to represent all permissible solutions we need W extra vertices (per 
element). This means that we ‘encode’ all solutions to the knapsack in a graph 
with m • n • W vertices (plus an extra two for accept and reject nodes). The 
vertices corresponding to item kij are labelled where the variable w denotes 
the ‘current weight’. The key rank graph contains a start node 5, an accept node 
A and a reject node R. The edges are constructed as follows: 

- (V™j, V^i ■) which corresponds to the item not being chosen in this set 

- (v™j, if the it em is chosen for this set and w + Wij < W 

- (V 7 ^_ 1 j, R) if no elements are chosen from the set 

- (V j w m _ 1 , A) if the item is chosen for the last set and w + < W 

- (V™j , R) if the item is chosen for this set and w + Wij > W 

- S = Vq 0 to set up the start node 

When visualising the key rank graph it will be convenient to think of the 
indices i, j as though they are ‘flattened’ (z.e. they are topologically sorted and 
occur in a linear order). In this representation the graph is n • m deep, W wide, 
where the width of the key rank graph essentially tracks the current weight (of 
the partial keys). Each vertex has exactly two edges coming out of it (with the 
exception of A and R): either the vertex was ‘included’ (this corresponds to the 
choice of selecting the corresponding value of the key chunk to become part of 
the key) or not. If the answer is yes then the edge must point to the first item 
in the next key chunk, as we can only choose one item per key chunk, and the 
weight must be incremented by the weight of this key item. If the item/vertex is 
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not chosen then the edge must go to the next item in the chunk (or reject if this 
was the last item) and the weight must not be incremented as the item was not 
selected. For any partial key, if a new key chunk is added and the new weight 
exceeds W then this is not a valid key and thus the path will go to the reject 
node R. 


Graph Example. To illustrate the working principle we provide in Fig. 2 the 
process of constructing the graph for the example provided above. 

Initially the graph is constructed (top right) and the start node is initialised 
based on the rule S = VqV The width of the graph is set to be 5 (0-4) as this 
matches our maximum weight (W = 5). The depth of the graph becomes 6 as 
each chunk contains 3 items. 

Next (middle left) the first two children from the start node are created. 
The edge that denotes the chunk is, in fact, selected (the right child) is built 
following the rule (VqV Vqi ' °) ? which creates the edge from the start node, to an 
element in the next chunk. The edge that denotes the chunk is not selected (the 
left child) is built following the rule (VqV V^q), which creates the edge from the 
start node to the next item within the same chunk. 

Moving onto the following step (middle right), children continue to be created 
through the same set of rules. However, note that at the point a link is created to 
the accept node based on the rule (Vq 1: A), this demonstrates that the selection 
of key chunk 0 followed by 0 is a valid solution to the problem. 

In the following steps links continue to be created based on the rules until all 
paths in the graph are created. Please note that throughout the construction of 
the graph, the last item in each chunk will have a left child that points to reject 
(as obviously there are no further chunks to select) but these have been omitted 
from the example diagram for the sake of clarity. 

All the greyed out nodes also have their children calculated. However, as they 
do not alter the path count, we have excluded them from the example figures to 
aid clarity. 

Each path from 5 to A if, corresponds to a key with lower weight than our 
secret key. Thus, counting these paths will yield the rank of the secret key. While 
in general path counting is hard [12], we explain how our graph structure, having 
at most two outgoing edges per node, lends itself to efficient counting. 


3.2 Counting Valid Paths 

Clearly our key rank graph is a directed acyclic graph. We have already men- 
tioned that it is convenient to ‘flatten’ the graph (as it has been presented in the 
example). This ‘flat’ graph is also more suited for an efficient counting algorithm. 
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> Left Child 
Right Child 


Key Chunks (Value, Weight) 


0 0 
1 1 


0 0 


2 3 


Cumulative Weight 
0 12 3 4 



W: 5 



Cumulative Weight 
0 12 3 4 



® ® 


Cumulative Weight 
0 12 3 4 



Cumulative Weight 
0 12 3 4 



Cumulative Weight 
0 12 3 4 



Fig. 2. An example showing the construction of the graph for the small example 
instance provided 
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Hence, from this point onwards we will now assume that the graph is topologi- 
cally sorted 4 . The start node will be labelled 1 and the final node will be labelled 
A. There are n- m-W + 2 vertices in the graph, we have that A = n • m • W + 2 
and R = A — 1 . 

We also assume two constant time functions LC'(-), RC ( •) which return the 
index of the left child and right child, respectively. The algorithmic descriptions 
of these functions can be found in Fig. 4 for our particular graph. We therefore 
have the following recurrence relation, where PC is a vector and PC[i\ stores 
the number of accepting paths from i to A: 


sPC[c\ = < 


1 , 

0 , 


PC[LC(c)\ + PC[RC{c)\, 


if c = A. 
if c = A — 1. 
otherwise. 


The total number of paths between 1 (our start node) and A (our accept 
node) is then simply PC[ 1]. This recurrence relation forms the algorithm given 
in Algorithm 1, which assumes that LC,RC are globally accessible functions. 


Algorithm 1 . The key rank algorithm 
PC [A] <- 1 

for c — A — 1 to i = 1 do 

PC[c] «- PC[LC(c)\ + PC[RC(c)\ 

end for 
return PC[ 1] 


For an example of this, see Fig. 3. This figure shows that the vector is tra- 
versed from the end back to the start, and cells are filled by summing the values 
in their left and right children cells. For clarity in the figure, we only show 
example links betwen two cells, whereas in practise they are present on all. 


Correctness. The base case of PC [A] = 1 is self explanatory; there is exactly 
one path from A to A , the path involving no edges. From an arbitrary node c, 
it is possible to traverse the edge to the left child (and thus take however many 
paths start there) or traverse the edge to the right child (and do the same) and 
we conclude that PC[c] = PC[LC{c)\ + PC[RC(c)\. We can iterate over all 
nodes starting at the final node A and working backwards until we reach the 
start node, and since our graph is topologically sorted when we are operating on 


4 It turns out that because the path counting for the key rank graph is already 
extremely fast and memory efficient, the choice of sorting is irrelevant, bar the 
exception that S must be the first node and A must be the final node. However, 
when it comes to key enumeration this will be an important consideration and thus 
will be discussed in further detail in the corresponding section. 
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Traverse vector 


Count 


RC 



Fig. 3. Example demonstrating how the path count is calculated. Two nodes children 
links have been included to demonstrate the process. 

a node, the values for the nodes’ children will already have been calculated as 
they come later in the topological sorting. 

We note at this point that this counting algorithm is exact However, as we 
pointed out before, we need to convert floating point distinguishing scores to 
integer weights, and this conversion may incur a loss of precision and hence 
could cause a loss of accuracy. We discuss this in Sect. 5. 


Time Complexity. The time complexity of the key rank algorithm depends on 
the number of vertices in the key rank graph and their size. Our graph contains 
A = m • n • W vertices. The integers stored in the vertices could be up to 0(2 A ) 
(and thus be of size O(A)) because in the worst case each value can be double 
the previous value (if PC[LC(i)\ = PC[LC {%)]). Hence, given that we have A 
vertices and perform an integer addition with an A-bit variable for each vertex, 
we have worst-case time complexity of 0(A 2 ) = 0(m 2 • n 2 • W 2 ). 

However, whilst we touch each vertex once, we know that there are at most 
0(n m ) keys. Consequently, we need no more than 0(m log n) bits to store the 
path count (in contrast to the O(A) = 0(m • n • W) bits for the worst case). 
Hence the time complexity for computing the key rank via the key rank graph 
is 0(m 2 • n -W • logn). 

It is worth noting that the key depth does not factor into the time complexity 
and the following example will help to clarify this. Consider the target key which 
has weight 1 in every column (which gives W = 16 and a grid with 65536 nodes). 
If all other key chunks have weight 0 then the target key will have rank 2 128 since 
all other keys have a lower weight. However, if all other key chunks have weight 
2, then our target key will have rank 0 because all other keys have a higher 
weight. None of the other values affect the size of the graph and thus it is clear 
that the runtime is not changed by the key depth. 

In fact for AES-128 the values of m,n are also fixed and thus we get that 
the algorithm runs in 0(W), that is to say it is linear in the weight of the secret 
key. See Sect. 5.1 for experiments supporting this. 
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Algorithm 2. The key enumeration algorithm 
KL[A\ <- 0 

for c — A to 1 do 

KL[c] <- ( value(c),KL[RC(c )]) U AL[LC(c)] 

end for 
return KL[ 1] 


4 Parallelisable Key Enumeration Algorithm 

We are able to further modify our algorithm such that with minor (standard 
book-keeping) adjustments, we are able to list all valid paths, as opposed to just 
counting them, with reasonable efficiency. The algorithm is given in Algorithm 2 
and requires an additional (constant time) function call value which, given an 
index c, returns the value of a vertex. We write (a, {x c } c ) to mean {(a,x c )} c . 
That is to say if we concatenate an item a with a set, we are really concatenating 
the item a with every item in the set to form a new set. 

It is now easy to use this for key enumeration. Assume we have the resources 
to enumerate/test up to B keys. Then, we choose some weights (which corre- 
spond to key guesses) and use the key rank algorithm to determine their ranks 
and compare them to B. This allows us to quickly select the appropriate W for 
the given B. Then, Algorithm 2 proceeds as follows: for any valid path (in the 
key rank graph), every time a right child is taken (this can be determined by the 
node indices) the corresponding value for the respective key chunk is chosen. A 
left child means that we are not taking a particular value for key chunk. In this 
manner the keys are effectively reconstructed from the key rank graph. 

If one wanted to enumerate the keys in a smart order, this would simply be 
a case of altering the construction of the tree which stores the valid key chunks 
for enumeration. Currently the valid key chunks are stored in numerical order 
within the tree, however if this was changed such that they were stored in order 
of scores, the keys would be rebuilt in a near optimal order. 

In the rest of this section we discuss run time and memory requirements. 
Whilst the run time is bounded by the number of keys that we want to enumer- 
ate, we show there are different strategies to improve the memory performance. 
Finally, we show that with a further simple observation, we can parallelise the 
key enumeration algorithm. 


4.1 Time Complexity 

We begin with a worst case analysis considering a general graph. In this case, 
the enumeration algorithm would be exponential in the length of the number 
of vertices, because to generate all paths (each vertex has two children) the 
algorithm clearly must take 0( 2 A ) time. 

However, in our key rank graph, each path corresponds to a valid key with 
weight lower than W . Considering this, the run time of this algorithm is relative 
to the rank (which is determined by W) and not to the total number of keys; 
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hence this algorithm can be used to enumerate keys for a given workload in time 
0(m 2 -n-W • B • log n). This is because all 0(n-m-W) nodes are touched once, 
and B keys are reconstructed which are of length m • logn. 


4.2 Memory Efficiency 

How we topologically sort the key rank graph has a major impact on the memory 
efficiency of the key enumeration. While there are a variety of explicit topological 
sorting algorithms in the literature [8,11], we are able to avoid explicit sorting 
because we know our graph structure in advance. Hence, we show that our graph 
can be sorted implicitly by how the nodes are numbered within the calculation 
of the left and right child functions. The remaining question is what method of 
sorting is the most desirable. 

In Fig. 4 we demonstrate topologically sorting the example graph previously 
considered in Fig. 2, as well as present the associated pseudo code. There are 
alternative sorting methods available which were considered, and we discuss the 
pros and cons of these in the extended version of this paper available on ePrint 5 . 
We also discuss how to improve memory efficiency further by appropriately stor- 
ing the generated keys. 

Wide Sort. In this sorting the graph is numbered one chunk at a time, one 
item at a time, along the weight in increasing order (see Fig. 4). Formally given 
a chunk, item and weight (x, y , z) the index is i = x- W- n + y- W + z. This 
is a valid topological sorting of the graph, since a nodes’ children will be either 
one item lower in the same chunk (for the left child) or the first item in the next 
chunk (right child) both of which have a higher number. 

This is the topological sort we described for key rank. Note that, since key 
rank is extremely fast we describe the most intuitive sort since it did not have 
an impact on performance, while with enumeration this is no longer the case 
and must now be taken into consideration. 

The advantage of this sorting is that it is due to the fact an element will only 
need to look at the item below and the item at the top of the next chunk; these 
are the only things needing to be stored in memory. This makes it very memory 
efficient requiring 0(W) memory. 

The disadvantage of this method is that it is highly serial and it does not 
seem possible to (easily) parallelise. 


Key Storage. The topological sorting of the graph is clearly a crucial factor for 
memory efficiency. The other factor is how keys are represented/stored within 
our graph. 

In the algorithm as described all (partial) keys are stored at each point in 
the algorithm. This will become very inefficient. Consider, for example, the case 
where you want to enumerate all keys. There are 2 120 keys which have the first 

5 https://eprint.iacr.org/2015/689. 
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Cumulative Weight 
0 12 3 4 



Left Child 

Right Child 

if (n • W) — (c mod (n • W)) < W then 
return R 

w' c mod W 

* , ( c—w ') mod ( n-W ) 

1 w 

else 

■ , c-w'-i-W 

J n-W 

return c + W 

if w + Wij > W then 

end if 

return R 

else if i ^ m — 1 then 

return (i + 1) • n • w' + W + Wij 
else 

return A 

end if 


Fig. 4. Topological sorting of our previous example. Note that the deepest node in 
each chunk will be guaranteed to have a left child leading to R ; for clarity these paths 
are omitted (top). Pseudo code of how the child indices are calculated for each node 
in the tree (bottom). 

key chunk set to zero (hence this chunk would be duplicated 2 120 times). Clearly, 
one needs to choose an appropriate data structure, and we use a tree, see Fig. 5. 
This key tree is passed to a separate algorithm that converts it into a series of 
keys for testing. The advantage of this is threefold. First, it greatly speeds up 
the enumeration. Second, the conversion of the key tree into a list of keys is 
trivially parallelisable, and third, the actual testing (in our case checking the 
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Fig. 5. The key tree for all possible three character keys containing ‘A’ or ‘B’ 

AES encryption using a given plaintext /ciphertext pair) can be amortised into 
this cost. 

4.3 Parallelisation 

We can achieve parallelisation with a simple observation: by adjusting the graph 
such that instead of vertices with a weight lower than W going to the accept 
state, we only allow vertices with weight in the range between the two weights 
W\ and W2 to reach the accept state. The width of the graph is defined by 
W2; W\ has no impact on the graph size. This results in an algorithm that 
enumerates ‘batches’ of likely keys. Hence, one can run multiple instances of the 
key enumeration algorithm in parallel, where each instance enumerates a unique 
batch of keys. 

All ranges of keys can be computed in parallel and require no communication 
between threads except for the initial passing of the distinguishing scores and a 
notification when the key has been found. It is hence trivial to utilise multiple 
machines (or cores). 

Setting W In an enumeration setting, the correct key, and therefore W is 
unknown. We create a series of ‘steps’ in W (to bound using Wi, W2 as intro- 
duced previously), which are enumerated in order until the correct key is found. 

Iterating across these W increments, we select the weights by first taking 
the most probable across all distinguishing vectors, i.e. the weight at the top of 
each column. If the correct key is not located, the weight limit is increased by 
an amount equal to moving down by one key chunk in a column. The generation 
of each W step is done according to the following: 

More complex methods of bounding the weights could be used, such as binary 
searches or similar, but this would increase the cost of calculating the capacities 
before the enumeration begins, with little tangible benefit. 

Also, it should be noted that if we simply incremented the capacity in the 
smallest possible steps, then the algorithm would then be guaranteed to be 
accurate, enumerating keys in the correct order. However, this would make par- 
allelism nearly impossible as each unit of work would be too small causing the 
overhead from the parallel computation to dominate the runtime. 
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Algorithm 3. Generating W increments for enumeration when W is unknown 

for k = 0 to m do 
C * 0o,. . . ,m 

for i = k to m do 
for j — 0 to n do 

c% + 1 

Calculate W of key chunks at depths c 

end for 
end for 
end for 


Further Speed Optimisations. Currently the algorithm operates on every node 
of the graph. However, some of the nodes are not even reachable from the start 
node (for example the greyed out nodes in Fig. 2). Hence any computation done 
on these nodes is wasted because it will never be combined into a solution. By 
precalculating the number of valid paths from S to all other nodes in the graph (a 
reasonably cheap operation compared to a large key enumeration - this is done 
using the key rank algorithm), we can skip over a node if the number of paths 
from the start node to here is 0 because any work here will not be combined 
with the final solution. 

5 Practical Evaluation and Comparison with Previous 
Work 

Our key enumeration and key rank algorithms are both based on a graph rep- 
resentation of a multi-dimensional knapsack. To define this multi-dimensional 
knapsack it is necessary to map distinguishing scores, which typically are float- 
ing point values, to integer weights. This is a very simple process of multiply- 
ing the raw score of value most 2 a , in the distinguishing vector by 2 p ~ a 
where p is the bit value of precision we wish to maintain. Then performing 
an abs has the double effect of removing the negative sign, and making the 
most probable (the most negative numbers) the smallest, meaning they have the 
lightest weight which maps to our knapsack representation perfectly. Formally 
Wij = M apToW eight (dij) where M apToW eight (di ^ ) = [abs(dij • 2 p ~ a )\ for 
p bits of precision. 

This requirement has implications for the performance of our algorithms, as 
the time complexities for both algorithms strongly depend on the parameters 
m (the number of key chunks), n (the number of items per chunk), and W 
(the maximum weight). In particular, for any fixed key size (and number of 
chunks) the size of the graph (i.e. the width) grows with IF, and W grows with 
the precision that we allow for in the conversion from floating point values to 
integers. 
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We hence focus our practical evaluation on the impact of the precision 6 , 
on accuracy 7 and on performance. First, we discuss the precision requirements 
for practical DPA outcomes. Second, we explore the practical impact on the 
performance of key rank when we increase the precision. Third, assuming we 
allow for sufficient precision, we ask what are the best performance results that 
we can achieve on single but also many core platforms for the key enumeration. 

It is clear that to answer these questions we need to be able to generate 
many practically relevant distinguishing vectors in a manner that is comparable 
to previous work. We hence decided to adopt the simulator used by Veyrat- 
Charvillon et al. [13]. Veyrat-Charvillon et al. create distinguishing vectors based 
on attacking the AES SubBytes output, assuming noisy Hamming weight leaks, 
and using the Hamming weight as power model. Their DPA simulator allows us 
to manipulate the level of noise, and the number of measurements. The simulator 
then performs a standard DPA by utilising template matching as a distinguisher 
(this has been shown by Mangard et al. [10] to be equivalent to performing a 
correlation based DPA with a perfect model). They output ‘additive’ scores (by 
taking the logarithm of the raw matching scores), which we pass directly to 
our MapToW eight function. In all experiments we keep the number of traces 
constant at 30 (which matches [13]) and changed the variance of the noise to 
create ‘deeper’ keys. 


5.1 Evaluating and Comparing Precision 

In practical DPA attacks the combination of measured power traces, model val- 
ues, number of traces and distinguisher will influence the effective precision of the 
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Fig. 6. Impact of the distinguisher (left: correlation, right: Gaussian templates without 
log2) on the precision requirements when considering up to 16 bits of precision. 

6 Precision is the ability to reproduce a measurement result, i.e. if several measure- 
ments of a variable give very close values then the measurement is precise. 

7 Accuracy is the closeness of a measurement to a true value, i.e. this relates to the 
‘trueness’ of a measurement. 
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distinguisher scores 8 . We discuss the mentioned factors briefly. Then we experi- 
mentally determine the necessary level of precision for our key rank algorithm, 
and compare this to the number of bins for the method by Glowacz et al. [6] . 

Precision in Factors Influencing DPA Outcomes. Various factors can 
influence the outcome of a DPA attack, and also have an affect on the amount 
of precision required to accurately represent distinguishing scores. These can 
include the resolution of the leakage traces, the power model used, and the 
distinguisher applied. In our experiments we vary the precision from four to 
sixteen bits. 


Experimentally Measuring Precision for Key Rank and Glowacz et al. 

We ran precision tests using Veyrat-Charvillon’s simulator, using N = 30 and 
variance two) to determine the appropriate level of precision for further exper- 
iments. We plot the difference in ranking outcomes for increasing precision in 
Fig. 7 (left). In this figure, and in all figures that will follow, we plot outcomes of 
individual experiments in gray, and average outcomes in black. The x-axis show 
the precision in Wi,j. The y - axis refers to the change in ranking outcomes when 
increasing the precision by 0.1 bits from the previous step. From 11 bits onwards 
the outcomes do not change anymore. Because our ranking method is exact with 
enough precision, we can infer that with 11 bits of precision in Wij we produce 
exact ranks. Already from 4 bits of precision (on average, as plotted in black) we 
are within five bits of accuracy from the real result. From about 8 bits onwards, 
increasing the precision changes the ranking outcomes by just under a bit for 
our algorithm. 



Bits Number of Bins x 10 5 


Fig. 7. Bits of precision for key rank (left) and number of bins for Glowacz et al. (right). 

8 Veyrat’s simulator stores values in variables with double precision (i.e. one has 53 bits 
of precision) . But effectively, only a few of them are necessary to contain the effective 
precision. 
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We implemented the convolution based method by Glowacz et al. [6]. Their 
method is essentially based on building m histograms (one from each of the 
distinguishing vectors) and counting the keys by counting items in the ‘amal- 
gamated’ histogram efficiently via convolution. Figure 7 (right) shows that they 
achieve very high average precision (plotted in black) from about 50,000 bins 
onward. We can therefore conclude that using 50,000 bins roughly corresponds 
to 11 bits of precision in Wij. Glowacz et al. [6] actually recommend to use 
500,000 bins in their paper. 

Recall that we hypothesised that different distinguishers would lead to differ- 
ent precision requirements. To test this hypothesis we implemented two further 
distinguishers for the simulator: one distinguisher was based on correlation and 
one was based on Vey rat- Char villon’s method but without applying the loga- 
rithm. Figure 6 shows the results for them, this time we allowed up the 16 bit 
precision. The plots show that indeed, different distinguishers require different 
levels of precision, and that correlation has the least requirements. 

To provide further evidence for the exactness of our ranking algorithm 
(provided enough precision), we considered the difference between the key rank 
output by our algorithm, and the key rank output by Glowacz et al. In this exper- 
iment, we used 16 bits for our algorithm and 500,000 bins for Glowacz et al. Fig. 8 
shows the identical trend as Fig. 1 (right panel) of Glowacz et al. Hence the dif- 
ference between our ranking outcomes and their ranking outcomes are identical 
to the rank estimation tightness that they measure, reinforcing the exactness of 
our ranking outcomes. 



Fig. 8. Observed difference in calculated key rank between our algorithm and 
Glowacz et al. 


5.2 Evaluating and Comparing Run Times for Key Rank 

We explained in Sect. 3 that the run time of the Key Rank algorithm is inde- 
pendent of the actual depth of the key. The run time depends on the size of the 
graph, which is fixed for a certain choice of m and n, and hence depends on the 
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Fig. 9. Impact of the size of W (left) and precision in Wij (right) on the run time of 
Key Rank 

size of W. Since W is derived from summing the weights of SK , its precision 
will be determined by the precision that we allow during the conversion of the 
distinguishing scores. 

We hence experimented with the relationship between run time and size of W 
and also precision. We did this by fixing all parameters for Vey rat- Char villon’ s 
simulator and only varying the precision allowed in the function MapToW eight. 
As in the previous graph, we upper bounded the precision in W at 16 bits. 

Figure 9 shows how the run time increases for bigger W (left) and more 
precision in W (right). The run times for sufficient precision (i.e. 8 bits for W) 
are well below half a second. Even with 11 bits of precision [i.e. accurate ranking 
outcomes) our average run time is around 4 s. The plot shows that this average 
(black) is tracked well by the individual experiments (gray). 


5.3 Evaluating and Comparing Run Times for Key Enumeration 

The run time of the key enumeration algorithm (as referred to by KEA in the 
graphs that will follow) is dominated by the depth to the key. Veyrat-Charvillon 
et al. [13] presented the current state of the art for smart key enumeration, and 
they kindly gave us access to the latest version of their implementation. We 
were hence able to run their code alongside ours. Therefore for all graphs that 
we provide in the following, the timings were obtained on identical platforms. 
Note that for all experiements, as the toolbox provided a known secret key on 
which the simulated attack was based, we knew at which point the enumeration 
had found the correct key without needing to performm an AES operation on a 
known plaintext /ciphertext pair (this is common within the literature). 

Single Core vs Multi Core Comparisons. Figure 10 gives a comparison of run 
times of Veyrat-Charvillon et aV s algorithm and our algorithm on a single core 
(left). We sampled multiple distinguishing scores for each key depth and ran our 
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Fig. 10. Comparison between Veyrat-Charvillon et aV s enumeration algorithm and our 
algorithm for increasing key depths on a single core (left), and run times for parallel 
instances of the key enumeration (right). 

respective key rank algorithms. The graphs show that from key depths just under 
30 bits onwards we clearly outperform Veyrat-Charvillon et aV s algorithm, even 
on a single core. On the right, we provide some performance graphs when running 
our key enumeration algorithm on multiple cores. The graph shows that eight 
cores can enumerate 2 40 keys in the same time as one core enumerates 2 37 , which 
is a vast difference. Also another result of note is a single core run enumerates 
2 38 keys in 13.9 h and four cores performs the same enumeration in 6.4 h. 
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A Computing Environment 

All code was implemented using Java 1.7, with the exception of the Glowacz 
et a/.’s algorithm [6] which was implemented in Matlab to enable very fast 
convolution of the histograms. The language difference here was not an issue 
because key rank is so that fast we only ran accuracy comparisons and not 
timing comparisons. The implementation of Veyrat-Charvillon et a/.’s key enu- 
meration algorithm [13] was provided by the author, and translated into Java 
allowing for direct speed comparisons. 

Running the single core enumeration tests, compared to Veyrat-Charvillon 
which are plotted in Fig. 10 (left), took place on a system running Arch Linux, 
with an Intel i7-4790S and 8 GB of system memory. 
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Precision tests required larger memory capabilities and as such were carried 
out on a system running Ubuntu, with an Intel Xeon E5-1650 and 32 GB of 
system memory. 

Finally the multiple core tests plotted in Fig. 10 (right) were run on a cluster 
based environment, where each individual node provided 2 Intel E5-2670s and 
64 GB of memory. 
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Abstract . To design effective countermeasures for cryptosystems against 
side-channel power analysis attacks, the evaluation of the system leakage 
has to be lightweight and often times at the early stage like on cryptographic 
algorithm or source code. When real implementations and power leakage 
measurements are not available, security evaluation has to be through 
metrics for the information leakage of algorithms. In this work, we propose 
such a general and unified metric, information leakage amount - ILA. ILA 
has several distinct advantages over existing metrics. It unifies the mea- 
sure of information leakage to various attacks: first-order and higher-order 
DPA and CPA attacks. It works on algorithms with no mask protection 
or perfect /imperfect masking countermeasure. It is explicitly connected 
to the success rates of attacks, the ultimate security metric on physical 
implementations. Therefore, we believe ILA is an accurate indicator of the 
side-channel security level of the physical system, and can be used during 
the countermeasure design stage effectively and efficiently for choosing the 
best countermeasure. 


Keywords: Information leakage amount • Side-channel security • Power 
analysis attack 


1 Introduction 

In the past decade, various side channel attacks (SCAs) utilizing the system 
power consumption information, such as differential power analysis (DPA) [16], 
correlation power analysis (CPA) [5], mutual information (MI) attacks [14] and 
template attacks [6], have been presented to exploit the weakness in crypto- 
graphic implementations to recover the secret key. Masking is one of the most 
popular SC A countermeasures used to randomize sensitive variables [7]. When 
applying masking at a higher level, e.g., algorithmic or source code level, every 
key-sensitive intermediate variable is masked with at least one random value 
M by a carefully designed masking function /, e.g., normally exclusive OR or 
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multiplication. Therefore, during the cryptographic execution, any intermediate 
variable Z is substituted by its masked counterpart, /(Z, M), to prevent side- 
channel leakage. Perfectly masked devices with appropriate masking functions 
and unbiased random masks can eliminate first-order leakage, e.g., it is not fea- 
sible to break the system by exploiting only one time point of the power leakage 
traces which corresponds to one intermediate variable. However, they are still 
susceptible to second-order and higher-order attacks which combine two or more 
time points of power leakage to retrieve the secret key. Some practical masking 
schemes with limited implementation resources are not perfect and may still 
have some first-order leakage. 

How to evaluate a system’s SC A vulnerability /resilience comprehensively and 
accurately under different attacks is an important research issue. Sound quan- 
titative metrics will be used to guide the implementation of countermeasures 
and fairly compare the overall strength of countermeasures. One widely used 
metric is success rate, the probability that an attack succeeds given a number 
of side-channel leakage measurements [21]. This is indeed the ultimate practical 
measure of a system’s SC A vulnerability/resilience, which depends on the cryp- 
tographic algorithm, the specific implementation (with power measurement data 
available), and the attack model (whether it is DPA, CPA, MIA, etc.) as illus- 
trated in [12, 18]. We classify this metric as one for measuring the system physical 
leakage. In recent years, there are research interests in using other physical leak- 
age metrics on instructions of cryptographic software and therefore pinpoint 
the location of leakage to guide automatic implementation of countermeasures. 
Bayrak et al. [2] introduced a methodology for detecting power leakage, using 
an information theoretic metric - mutual information, MIl, between the key 
and leakage measurements. Although not explicitly related to the success rate, 
the metric MIl can be used to bound the success rate [10,21] in some mod- 
els. However, it requires power consumption data. There are also other efforts in 
evaluating the cryptosystem information leakage at an early stage, i.e., on source 
code of cipher software or even algorithms and with no need of power measure- 
ment data. The automatic software verification tools for SCA vulnerabilities [3,8] 
employ mutual information between the secret key and intermediate variables, 
denoted as MIa- The metric of quantitative masking strength, QMS, is defined 
by [11] to quantify the software leakage amount under imperfect masking, and 
a verification process is formulated to find the QMS value of cryptographic soft- 
ware source code. However, none of the prior work has shown the relationship 
between these system information leakage metrics and the success rate. It is not 
easy to translate the bound on these information leakage (MIa and QMS) to the 
final security measure of the implemented physical system, the success rate. 

In this work, we propose a new unified metric, information leakage amount 
(ILA), to quantify the system information leakage under various power analysis 
attacks at the early cryptographic algorithm or software code level, whether the 
cipher is unprotected or protected with masking. What is more, we also relate this 
metric to the success rate of DPA/CPA attacks in analytic models. Note that in 
this work we choose DPA/CPA because it has been shown both theoretically and 
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empirically that the first-order and second-order CPA attacks are equivalent to 
the strongest maximum-likelihood attacks under Gaussian noise models [9, 13, 15]. 
Our metric is unified, in the sense that it works on original algorithms with no 
masking, perfect masking, or imperfect masking under first-order DPA/CPA or 
second-order CPA. The success rate formulas are more general and simpler than 
the formulas in [9,12,13], which are only for first-order DPA/CPA on unmasked 
devices and for higher-order attacks on perfectly masked devices. Our explicit suc- 
cess rate formulas in terms of IL A bridge the gap between the system information 
leakage measure and the physical leakage measure. The metric ILA, as a great 
indicator of the ultimate side-channel security level of the physical system, can 
therefore be used during the countermeasure design stage (without real imple- 
mentations and power measurements) effectively and efficiently for choosing the 
best countermeasure. 

Table 1 summarizes the properties of our metric and compares it with other 
three metrics, QMS, MIa, and MIl- A question mark means that the metric on 
the column may be able to achieve the objective on the row, but it has not been 
demonstrated in literature. For example, work in [21] shows that the mutual 
information MIl has a monotonic relationship with the success rate of an attack 
with only two candidate keys, but generally the MIl may not be converted to 
the success rate explicitly. 


Table 1 . Comparison among ILA, QMS and MI as leakage evaluation metrics 




ILA 

QMS 

MI a 

MI l 

1 

First-order DPA/CPA Metric on Software Code/ Algorithm 

V 

v 

v 

X 

2 

Relate to First-order DPA Success Rate 

V 

y/ 

v 

y/ 

3 

Relate to First-order CPA Success Rate 

V 

X 

? 

? 

4 

Second-order DPA/CPA Metric on Software Code/ Algorithm 

V 

X 

? 

X 

5 

Relate to Second-order DPA/CPA Success Rate 

V 

X 

? 

? 


The rest of the paper is organized as follows. Section 2 gives an overview of the 
existing leakage metrics and defines our proposed metric. Section 3 establishes 
the success rate formula for CPAs in terms of our metrics. Section 4 presents 
experimental results to evaluate the metrics and compare them with others. 
Section 5 concludes the paper. 

2 Leakage Metrics for Cryptosystems with Masking 
Countermeasure 

In this section, we first introduce the notations used and existing metrics, and 
propose our unified metric ILA for first-order and second-order attacks on cryp- 
tographic algorithm with imperfect/perfect masking. We then analyze these met- 
rics in the case of Boolean masking. 
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2.1 Notations and Existing First-Order Metrics 

We denote sets by calligraphic letters (e.g., X), denote random variables by cap- 
ital letters (e.g., X ) which take values on the set (X), and denote observations of 
the random variables by lowercase letters (e.g., x). We let X^\ denote the ith bit 
of X. Fx and Ex are the notations for the probability and the expectation with 
respect to X, respectively. For a cryptographic system with masking protection, 
X, X, M denote the random variables for the key, the plaintext, and the mask, 
respectively, and each takes values in sets X, /C, M. Let F = /(X, X, M) denote 
the algorithmic intermediate variable that possibly leaks, which is an algorith- 
mic function of the known input X, unknown key K and the random mask M. 
For a second-order attack on masked devices, there are two select functions. One 
is Vo(X, X, M ) = g(F), which works on a key-sensitive intermediate variable and 
therefore is also a function of the input X, the key K and the mask M. Note 
the select function for an attack is determined by the system’s power model, and 
g(-) is usually Hamming weight or Hamming distance. Without loss of generality, 
the other select function is V\ = g(M ) which depends on the mask M only. The 
mask may be biased, i.e., not following the uniform distribution. If the mask is 
unbiased and the masking operation is appropriate, we call it perfect masking. 
Let k c be the secret key, k g G JC\{k c } be any other possible key hypothesis, and 
Nk = \JC\ be the dimension of the key set. 

A first-order attack uses only one select function Vo that corresponds to one 
time point on power traces. Therefore a first-order leakage metric measures the 
leakage of one select function that can be sensitive to both key and mask. Given a 
plaintext x, the secret key k c and a random number m, the target select function 
is Vq = Vo(x,k c ,m). The information leakage is measured by the dependency of 
Vq on k c . Under perfect masking, the distribution of is independent of & c , and 
hence the secret key could not be recovered from the leakage measurements of 
Vq. Otherwise, is still vulnerable to first-order power analysis attacks. There 
are mainly two existing first-order information leakage metrics. 

Eldib et al. [11] proposed to quantify the masking strength under DPA by 


QMS = (1 - A qms ), with A qms = max \D x k (F)-D x ' k '(F)\, (1) 

x,x' EX ,k,k' E/C v ' 


where D x ^ denotes the distribution of F given (x, fc), and A qrns is the maximum 
distribution difference. For perfect masking, QMS is maximum and reaches one, 
which indicates that the key K and the intermediate variable F are statistically 
independent. Without masking, QMS=0. For imperfect masking schemes, QMS 
is in the range of (0,1). 

The other metric uses the mutual information, an information theoretic quan- 
tity commonly used for leakage evaluation. The mutual information between two 
discrete random variables X and Y is defined as: 



where p(x,y) is the joint probability distribution function of X and Y, with 
p{x) and p(y) as the corresponding marginal functions. For continuous random 
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variables, the summation in definition (2) is replaced by integrations. Work in 
[3,8] uses the mutual information between K and F to measure the information 
leakage at the software code level. This mutual information only depends on the 
algorithm and we denote it by MIa = M1(K,F). In contrast, work in [2] uses 
the mutual information between K and the leakage measurements L. We denote 
it by MIl = MI (if , L), which is a physical leakage measure. 

Note that there is no second-order system information leakage metric based 
on QMS or MI shown in literature. In this work, we propose a general and 
unified metric on the selection functions (Vo for first-order attacks, Vo and V\ 
for second-order attacks), which reflects the system susceptibility to attacks. 

2.2 Our Proposed Information Leakage Metric 

Eldib et al. empirically [11] showed that there is a relationship between QMS and 
the number of traces needed in DPA. However, there is no theoretical proof for 
such relation, and how QMS relates with multi-bit CPA or higher-order attacks 
is unknown. We are seeking a new unified metric to reflect the information 
leakage at the algorithm level, similar to QMS and MIa, and meanwhile can 
explicitly relate to the success rate of different attacks, including DPA, CPA, 
and high-order attacks. 

Fei et al. [13] defined the confusion coefficient, for unmasked algorithm, as 
K,(k c ,k g ) = IE x{\V(X,k c ) — V(X, k g )] 2 } for the selection function V and the 
expectation being taken over X. Each confusion coefficient is defined between 
two key values. They showed that the confusion coefficients and the implementa- 
tion signal- noise-ratio (SNR) together explicitly determine the success rates for 
DPA and CPA. However, these confusion coefficients do not reflect the masking 
strength as they are defined for unmasked algorithms only. The confusion coef- 
ficients are also used to model the success rates for higher-order attacks with 
perfect masking in [9]. 

We propose to generalize the confusion coefficient definition for masked algo- 
rithms (possibly imperfect). We then propose the new metric ILA based on the 
generalized confusion coefficients. The ILA measures the information leakage of 
Vo (and Vf) under the protection of any masking countermeasure. 

Definition 1. We define the new first-order confusion coefficient fti o(k c: k g ) of 
masked algorithm as 

Kio(k c ,k g ) =E x {[E M (Vo\(X,k c )) -E M (V 0 \(X,k g ))} 2 }, (3) 

where Km(Vo\(X, k)) is the conditional expectation of Vo given (X,k) over M., 
and Ex is the expectation over A. 

Definition 2. The first-order information leakage amount ILAio is defined as 

ILAio ®/C\{fc c } \^10 ^g )\ 5 (4) 

where E/c\{/c c } is the expectation over all possible key hypothesis k g in JC\{k c }. 
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ILAio, MI a and QMS are all metrics for sensitivity evaluation at the algo- 
rithm level that do not require leakage measurements. QMS focuses on the 
extreme value among the differences of distributions of any pair (x, fc), (V, k') G 
(X,/C), but ignores the other differences. The extreme value indicates the prob- 
ability distance between the secret key to the one guessed key which is easiest 
to distinguish. However, the SC A succeeds only if the secret key is distinguished 
from all other guessed keys, not just one. Hence the expectation would be a 
better measure for information leakage than the extreme value. We can see that 
ILAio is an expectation of squared distances: 

ILAio = E p(k g )K 10 (k c ,k g ) 

k g EJC\{k c } /r\ 

= E p(k g ) £ p(x) • {E M [Vo\(x : k c )}-E M [Vo\(x,k g )]} 2 . w 

k g eJC\{k c } xex 

The calculation of ILAio through Eq. (5) involves iterations over k g G JC\k c 
and xGf, which can be time-consuming for large sets of /C and X . These 
same iterations appear in MIa and QMS definitions too. As recommended for 
MI calculations by [2,8], the exhaustive iterations in calculating ILAio can be 
replaced by averaging over a random subset of sufficiently large size. Thus the 
computational complexity is similar for the three metrics ILAio, MIa and QMS. 

Different from MIa and QMS, we find that ILAio can be related to the 
success rates of DPA and CPA in explicit formulas, similar to the work in [13]. 
In addition, ILAio can be extended to a second-order metric ILA 2 o as well, 
while there is no such work on MIa and QMS yet. 

A second-order attack retrieves the secret key by combining the information 
leakage at two leakage points, Vo(X, K, M) and V\{M). A second-order metric 
measures the leakage under second-order CPA attacks. 

Definition 3. For a key hypothesis k g G lC\{k c }, we define the second-order 
confusion coefficient of masked algorithm as 


K 2 o(k c ,k g ) =E x {[E M (VoV 1 \(X,k c ))-EM(VoV 1 \(X,k g ))} 2 }, (6) 

where Vi = Vi — IE x,M[Vi\fi = 0, l, are the centered select function values. 

Definition 4. The second-order information leakage amount ILA 2 o is defined 
as 

ILA 20 =Ejc\{k c }[K2o(k c ,k g )]. (7) 

Comment : Although the definitions (4) and (7) of ILA depend on the correct key 
& c , in many practical situations ILA is key-independent. The leaked intermediate 
values often depend on key k c only through X ® k c . In that case, for uniformly 
distributed plaintext X, the ILA is in fact independent of k c since k g ®k c iterates 
over the same values for all k c . 
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2.3 Analysis of the Metrics Under Boolean Masking 

To better understand the metrics ILA, MIa and QMS, we compare them in detail 
for a specific setting of biased Boolean masking F = Z(X, fc)®M as in [11], under 
several commonly used assumptions on the distribution of unmasked Z(X, k ) and 
keys. Here Z(X, k ) denotes an unmasked intermediate variable with X being the 
random plaintext. Hence Vo = g(Z(X, k) ® M). 

Assumption 1 (Uniform Intermediate Variable). Given a key k G K, for random 
plaintext X , the unmasked intermediate variable Z(X, k ) is uniformly distributed. 
That is, Z(X,k) ~ U( 0, 2 6 — 1), for all k G X, where U( 0, 2 6 — 1) denotes the 
discrete uniform distribution on {0, 1, ..., 2 b — 1} with b being the number of bits for 
Z(X.k). 

Let y 0 * (A, k) = g(Z(X,k)) denote the unmasked select function. Under 
Assumption 1, Ex[Vq(X, k)] is a constant independent of keys k. In general, 
we would like the unmasked select function values under two different keys to 
be uncorrelated. 

Assumption 2 (Uncorrelated Keys). For any pair of keys fci, fc 2 G JC , and ran- 
dom plaintext X , the select functions Vq(X, k\) and Vq(X, fc 2 ) are uncorrelated 

so that E x \yt(X,k 1 )Vf(X,k 2 )}=E x [Vt(X,k 1 )]ExW 

Under Assumptions 1 and 2, E x [U 0 *(X, /ci)U 0 *(X, k 2 )\ = {E x [U 0 *(X, h)]} 2 
will also be a constant independent of keys k\ and & 2 . Unfortunately, many select 
functions (e.g., the Hamming weights of an AES S-Box output) do not satisfy 
Assumption 2. However, for a random key fc 2 , a weaker assumption often holds. 


Assumption 3 (Weak Uncorrelated Keys). For any fixed key k\, let & 2 be a 
random key G tC\{ki}. For a random plaintext X , the intermediate variables 
Z(X,ki) and Z(X, fc 2 ) are uncorrelated so that Ex k 2 [^(X, k\)V(f(X, fc 2 )] = 
{E*A 0 *(XAi)]} 2 . 

Under Assumptions 1 and 3, Ex,/c 2 [Vq(X, ki)V(f(X, fc 2 )] is a constant, which 
helps us to derive simple explicit formulas of ILA in this section. Assumption 
3 makes the calculation of the metrics easier here, as it removes ILA’s depen- 
dence on many aspects of the algorithm including k c value. The leakage metrics 
ILA under these assumptions reflect the masking strength only. In the next 
section, Assumption 3 will not be assumed for DPA/CPA success rates deriva- 
tions though. 

We first consider the DPA attack, where Vo is on a single bit. Since ® is 
taken bit by bit, we can take both Z(X, k c ) and M as variables with one single 
bit, and Vo =» Z ® M. Let the distribution of the mask bit be P(M = 1) = p and 
P(M = 0) = 1 — p, we have the following property. 

Property 1. For the DPA model under Assumptions 1 and 3, if¥(M = 1) = p, 
then 


A Unified Metric for Quantifying Information Leakage 345 


- ILA 10 = (1 - 2p) 2 /2, 

- ILA 20 = 2p 2 (l — p) 2 , 

- MI a = 1 + (l-p)\og 2 (l-p)+p\og 2 (p), 

- QMS = \ - \\ - 2p\. 

The detailed calculations are given in Appendix A. Note that although the 
generalized confusion coefficients fti o(k c ,kg) (Eq. 3) and ^ 2 o{k c ,kg) (Eq. 6) are 
determined by the algorithm, their average terms ILA 10 and ILA 2 o become 
algorithm-independent and are only determined by the bias of the mask distri- 
bution, p, according to Assumption 3. For perfect masking, p = 1/2; unmasked, 
p = 0 or p = 1; imperfect masking, p takes other values. All metrics change 
with p and have one-to-one correspondence between each other. Particularly, 
ILA 10 * (1 — QMS) 2 /2. Work in [11] empirically finds that the number of traces 
needed for DPA is approximately N trace = 1/(1 — QMS) 2,2 . In Sect. 4.1, we will 
show that number of traces N tra ce I/ILA 10 oc 1/(1 — QMS) 2 instead. 

Figure 1 shows the relationship between these metrics and the probability p. 
It is symmetric about the x-axis which implies the same effect of the mask bit 
being 0 and 1. From Fig. 1, we see that ILA 10 and MIa have the same pattern, 
but ILA 10 increases from 0 to 1/2 and MIa increases from 0 to 1 as p goes 
from 1/2 to 0 (or 1). When p = 0 or p = 1, the device is without any masking 
protection, QMS = 0 while ILA 10 and MIa both reach their maximum. When 
p = 1/2, the devices is protected by perfect masking, ILA 10 = MIa = 0 and 
QMS = 1 which are consistent with no first-order information leakage. However, 
the second-order leakage still exists under perfect masking, and actually reaches 
its maximum (biggest leakage) 1/8. As the mask gets more biased, the first-order 
leakage increases while the second-order leakage decreases. 

Next we consider CPA in this setting. For CPA, Vo = HW(Z ® M) is the 
Hamming weight function of a 5-bit variable. We assume that the bits in the mask 



Fig. 1 . The quantities of several metrics under the biased masking for DPA. 
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are independent from the same distribution with P(M^) = 1) = p, i = 1, ..., 6 . 
Here M(p denotes the ith bit of the 6-bit mask variable M. 

Property 2. For the CPA model under Assumptions 1 and 3, 

ILA 10 = 6(1 - 2pf/2, ILA 20 = 2&p 2 (l - p ) 2 . (8) 

The proof is provided in Appendix B. 

For the CPA model, the ILAio and ILA 20 follow the similar pattern as in the 
DPA model, just differing by a factor of 6, the number of bits. In fact, the DPA 
model is a special case of the CPA model with 6=1. The other two metrics MIa 
and QMS are harder to derive for CPA. It is hard, if not impossible, to relate 
MIa and QMS to the success rate of CPA. 

3 Relating ILA to DPA and CPA Success Rates 

As shown in [9,12,13], the success rates of first-order DPA and CPA on unmasked 
devices and second-order CPA on perfectly masked devices can all be expressed 
in terms of the confusion coefficients and the implementation signal-to-noise- 
ratio (SNR). Our metrics ILAio and ILA 20 are algorithmic properties like the 
confusion coefficients. We generalize the results of [9,12,13] to masked imple- 
mentations (possibly with imperfect masking), and show that the success rates 
of CPA/DPA should be determined by the SNRs and our generalized confu- 
sion coefficients. The formulas are further simplified to consist of ILAio and 
ILA 2 o- We show derivations for the success rates of first-order and second-order 
DPA and CPA on masked devices in this section. We then use these metrics to 
compare the first-order leakage and second-order leakage. 


3.1 First-Order Power Analysis Attack Model 

We assume a commonly used linear power consumption model with additive 
noises for both DPA and CPA, 

Lo = Co + eoVo + croro, (9) 

where ro is the unit noise variable (the mean is 0 and the variance is 1) and 
eo is the single-bit unit power consumption. Hence the physical system SNR is 
So = ^o/cto. We derive the success rate formulas for first-order CPA in terms of 
SNR and ILAio, and consider DPA as a special case of CPA with 6=1. Notice 
that some other researchers defined SNR differently as SNR* = e^Var(Vo) / <J q, 
which includes the variance of intermediate value Vo also. We consider Var(V b) 
to be part of algorithmic leakage measured by ILAio, since it depends on Vo- 
Our SNR reflects purely the physical system property, since eo reflects the power 
consumption differential caused by one-bit. 

The leakage measurements of Lq are denoted as C = {A,o, ^ 2 , 0 , In, 0 }, where 
n is the number of traces. For unmasked devices, the CPA exploits the corre- 
lation between the leakage L and unmasked select function Rq* = g(Z(X,k )) 
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to discover the secret key. For masked devices, the attacker does not know M 
value, and therefore does not know the value of Vo = g{F(X , fc, M)). To conduct 
CPA, the attacker has to correlate L with Em [Vo|(A, fc, M)], the expectation of 
Vo over all possible mask values. This value is Vf (X, k) for unmasked devices, 
and is a constant (thus no leakage) for perfectly masked devices. Let v ^ • 0 
denote Vo {xi , k g ,mi) for the i-th power trace, the selection function value under 
plaintext guess key k g and the mask Em^^q] denote the targeted 
expectation of Vo (#*, k g ,m) over all m G Ad, and E[Vq ] denote the expectation 
of Vo(x,k g ,m) over all x G X and m G Ad. Under the power model (9) with 
imperfect masking, the first-order CPA distinguishes the key k g by the Pearson’s 
correlation: 

n 

E(/ i ,o-I.,o)[E M « i ,o)-E(U)] 

P 9 = 7 <=1 ( 10 ) 

J E (U,o - l ,o ) 2 E PM< <|0 ) - W )] 2 

V i= i i=i 


where Z. ? o = E h,o/n is the mean of power leakage. 

i=l 

The CPA succeeds when p c — p 9 > 0 for all k g G JC\{k c }. For a random plain- 
text attack with a large number of traces, under Assumption 1, the denominator 
of (10) converges to the same limit for all k g , since E[EmCgEo)] = E(VJf) = 
E(y c ) and E{E M [(w^ A0 ) 2 ]} = E{E M [(fm iii0 ) 2 ]}- Hence p c -p 9 > 0 is equivalent 
to that the difference in the numerators of (10) is positive. That is, p c — p 9 > 0 
when {k c ,k g ) > 0, where 

A\°(k c ,k g ) = — ~ — [EM«i,o) ~ ^m{v 9 i 0 )\. (11) 

a ° 


Let A* 0 denote the (N^-l) -dimension vector consisting of these A^°(k c , k g ) 
for all k g G JC\{k c }. Let /x and X denote the mean and variance of A\°(k c , k g ). 
Then following the work in [13,20], the success rate can be described with a 
multivariate Gaussian distribution 7V(/x, X/n) using the Central Limit Theorem. 
That is, 

SR = <Pz{V^kL)- ( 12 ) 

where is the cumulative distribution function (CDF) of the Nk — 1 dimen- 
sional Gaussian distribution with mean 0 and variance X. 

For unmasked devices, the mean vector /x and the variance matrix X are 
expressed by Fei et al. [13] in terms of their confusion coefficients k. With imper- 
fect masking, we show (in Appendix C) that similar expressions hold with our 
generalized confusion coefficients o- 


Theorem 1. Under CPA leakage model (9), the success rate of the CPA is given 
by Eq. (12). Under Assumption 1, the element in the mean vector pi correspond- 
ing to key k g i is 


Ugi 2 oij^o^gi) i 


(13) 
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And the elements of covariance matrix U are 


O kgi,k g i — ^1 •> ^^gi^gj — ^1 o{kc>k g i,k g j^ f OV k g i ^ k g j, (14) 

where m o(k c ,k g i,k gj ) =E x {[E m «i,o) - E M (^ ; i ; o)][E M (^m,i,o) -ImK, 1,0)]}- 

Similar to [13], we can get the above three-way generalized confusion coeffi- 
cients /cio(fci? ks) from two-way generalized confusion coefficients /cio(fci? fe) 
(see more details in Appendix D). 


Lemma 1. Given k c ,k g i,k g j E 1C, 

Kl ° (fcc ’ = 2 [Kl ° (fcc ’ ^ + Kl ° (fcc ’ (15) 

The average of K\ o(k c ,k g i) over all k g i is ILAio. By Lemma 1, the average 
of K>io(k c ,k g i,k g j) over all k g i k g j is ILAio /2. Replacing all the confusion 
coefficient terms in Eqs. (13) and (14) by their averages, we get an approximate 
asymptotic success rate for first-order CPA on masked devices: 


SR = $ 


h[!N k -l+J N k -l] 


( Soy/ny/TLAfo 

— ill Q — 1/5 


(16) 


where I N k -i is the (N^ — 1) x (7V& — 1) identity matrix with diagonal entries of 
ones and off-diagonal entries of zeros, J N k -i is the (7V& — 1) x (7V& — 1) matrix 
with all entries of ones, and Iat^-i is the (7V& — 1) dimensional vector of ones. 

The approximation SR formula (16) is very close to the SR formula (12) for 
small SNR 5 q- We will examine the approximation in Sect. 3.3. 


3.2 Second-Order Power Analysis Attack Model 

Second-order power analysis attack combines the two leakage measurements of 
Vo and Vi at two different positions involving the same mask M to break the 
masking protection. Similar to (9), we assume linear leakage for V\ 


L\ — ci + 61V1 + a iri, 


(17) 


where r\ is the unit noise. 

Second-order CPA uses n pairs of independent realizations of noisy physical 
leakage (Zi,o,/i,i), (fo.o.fc.i), {l n ,o, l n ,i) for {Lq,L{). Here l itj = c, + ejv^j + 
aj r ij,i = 1 ,...,n,j = 0,1. Denote the centered version of Lj and Vj by Lj = 
Lj — E (Lj) and Vj = Vj — E(V)), for j = 0, 1. While the first-order CPA exploits 
the correlation between Lq and Vo, the second-order CPA exploits the correlation 
between LqLi and VoW That is, it uses the centered product statistic: 

l n 


(18) 
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where l ig = (kj -l.j)/(Tj, j = 0, 1, is the centered leakage, i0 = ^- ?0 -E[V^] 
and Vm^i = v m , i — E[V1] are the centered select functions values under guessed 
key k g given mask m, and = Ui(ra). 

We denote the difference between the centered product statistics under secret 
key k c and guessed key k g as 

l n 

A n (k c ,kg) = — 'y ^ hohl\^M(Vrn,i,Q V m,l) ~ )] * (19) 

i= 1 

The second-order CPA succeeds when Z\^° (k c ,k g ) > 0 for all k g G JC\{k c }. 
Using derivations in [9,17,19], the success rate of second-order CPA also follows 
Eq. (12): SR = 

Ding et al. [9] expressed pt and 22 in terms of confusion coefficients n under 
perfect masking. With possibly imperfect masking, we generalize the formula in 
terms of our generalized confusion coefficients k^o (see details in Appendix E). 

Theorem 2. Under CPA leakage model (9) and (24), the success rate of the 
second-order CPA is given by Eq. (12). Under Assumption 1, the element in pt 
corresponding to key k g i is 


— (b b v 

Ugi — 2 ^2 0\kc,k g ip 

And the elements of covariance 22 are 


( 20 ) 


& kgi, kgi — ^20 (^c? kgi) ? ^k g i,k g j — ^2 o(kci kgii kgj') fo^ kgi 7^ k gji (21) 

where n 2 o(k c , k gi , k gj ) — E x{[EmR,i,o%,i) - EMK !l:0 v mi i)][EMft,i,o^,i) - 

E M(^ lj0 ^m, l)]}. 

Similar with Lemma 1, for fc c , k g i , k g j G /C, 

«20(fc C , kgi, kgj) = g [«2 0(k c , k gi ) + K 20 (k C , kgj) ~ K 20 (k gi , kgj)]. (22) 

As in the first-order analysis, replacing the generalized confusion coefficients 
1^20 by ILA 2 q, we get the approximate asymptotic success rate: 


SR = $ 


§[Uv fc ~l + J N k -l] 


Mo^iv / ^v / ILA 2 o ^ \ 

i]( 3 1 N k - ij. 


Next we evaluate the above approximations. 


(23) 


3.3 Approximation Errors in the Simple Success Rate Formulas 

Work in [9,13] gives the explicit theoretical success rate formulas for two cases: 
the first-order CPA on unmasked devices and the second-order CPA on perfectly 
masked devices, respectively. By plugging ILAio when p = 0 in (16) and ILA 2 q 
when p = 1/2 in (23), we get the two corresponding simple success rate formulas. 
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Compared to the formulas in [9, 13], our simple formulas ignore some higher order 
terms and replace the confusion coefficients by ILA. Here we study the effect of 
the simplification for CPA on unmasked and perfect masked AES. 

We show the difference between our simplified success rate formulas and 
the explicit success rate formulas of [9,13] in Fig. 2. The average error-ratio is 
defined as: Esr[\ NE xp iicit,SR — Nsi mp ie,SR 1/ NE xp iidt,SR], where NE xp iicit,SR and 
Nsim P ie,SR are numbers of traces needed to achieve a fixed SR value by the 
explicit and simplified theoretical success rate formulas respectively, and E$r is 
the expectation over all success rate values SR ranging from 0 to 1. Here, we 
take the expectation over discrete success rate values SR = [0.1, 0.2, 0.3, ...,0.9]. 



Fig. 2. The average error-ratio of number of measurements between explicit and sim- 
plified success rate formulas on AES S-Box 


Figure 2 shows that as the SNR grows, both error ratios increase. The error- 
ratio^ 10% when SNR < 0.26 for the first-order attack, and when SNR < 
0.16 for the second-order attack. Hence the simplified success rate formulas in 
Eqs. (16) and (23) work well for small SNR values. For practical physical imple- 
mentations, devices with large SNR values are very leaky and not considered 
secure. The success rate analysis is only meaningful when the SNR is small. 

3.4 Comparing Effectiveness of the First-Order Attack and the 
Second-Order Attack 

For unmasked devices, first-order leakage is sufficient to discover the secret key. 
With perfect masking, only second-order leakage can be used to discover the 
secret key. However, for imperfect masking implementations, both first-order 
and second-order leakage exist. Which leakage is more effective to exploit? We 
can compare them using the proposed metrics through formulas (16) and (23). 

Property 3. For a masked implementation 
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- The first-order attack is more effective when Si < y ILA *° ; 

- The second-order attack is better when Si > ' 

For very small SNR, the first order leakage will dominate. The threshold SNR 
value to determine dominance by the first-order or the second-order leakage is 
given by the square root of the ratio between the two information leakages: 

^ typical SNR value is known for certain physical devices, we 
can predict which type of leakage dominates and therefore guide the software 
designer in effective leakage reduction. 


3.5 Extension to Higher-Order Power Analysis Attack Model 

We now consider a cryptography algorithm protected by J-th order masking, 
with mask shares Mi, M 2 , ..., Mj. A J-th order attack combines the information 
leakage of Vo(X, K, Mi, ..., Mj) and the leakage of Vi(Mi), ..., Vj(Mj) to retrieve 
the secret key. J-th order power analysis attack combines the J + 1 leakage 
measurements of Vo, Vi, ..., Vj at J + 1 different positions to break the masking 
protection. The leakage vector is li = (I^q, ..., h,j)- Similar to (9) and (24), the 
leakage model is now: 

Lj = Cj + eiVj + djrj, j = 0, ..., J. (24) 

where rj is the unit noise. 

For a key hypothesis k g G JC\{k c }, we define the J-th order confusion coeffi- 
cient of masked algorithm as 

Kjo(fcc, k g ) = Ex{[E m {V 0 Vi-Vj\(X, k c )) - E m (V 0 V 1 ...Vj\(X, k gi ))] 2 }, (25) 

where Vi = Vi — ^x,M[Vi\,i = 0, 1, ..., J, are the centered select function values. 

Definition 5. The J-th order information leakage amount ILAjo is defined as 

ILAjq = E)c\{ kc }[njo{kc, k g )\. (26) 

As in [9] and in Sect. 3.2, we can derive the approximate asymptotic success 
rate as: 

VILA jo Yl $j 

SR = ^ [ i Wfc _ 1+Jwfc _ 1 ]( 2 (27) 

4 Numerical Results 

In this section, we first numerically investigate the relationship between success 
rates of DPA/CPA and the metrics ILA, MIa and QMS on synthetic data exam- 
ples. We also evaluate our metrics and the simplified success rates of DPA/CPA 
on realistic measurement data. 
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4.1 Numerical Comparison of Metrics Versus Success Rates 

We first show, by numerical examples, that ILAio measures the leakage infor- 
mation amount under CPA, but MIa and QMS do not. We consider synthetic 
data examples with biased masking on the outputs of an AES S-Box, where the 
masking bits are independent with pi = P(M^\ = 1 ), i = 1 , 2 , ..., 8 . 

In the first example, the last 4-bits are perfectly masked with p 5 = p 6 = 
P7 = Ps = 0.5, and the information leakage is through the Hamming weights 
of the first 4-bits according to model (9). We consider two cases where ~p 4 = 
[Pi,P 2 ,P 3 ,P 4 :\ = [0.5, 0.2, 0.2, 0.1] and ~p 4 = [ 0 , 0.4, 0.4, 0.4] respectively. We cal- 
culate the values of the different metrics through definitions in Eqs. (1), (2) and 
(5), rather than using specialized formulas in Properties 1 and 2 (which only 
apply to Boolean masking with equal p ^ s for each bit). Detailed algorithms are 
provided in Appendix F. In both cases MIa = 1-09, but the information leak- 
age amount differs with ILAio = 0.68 and ILAio = 0.56, respectively. Figure 3 
(a) shows the success rates of CPA in both cases on synthetic data generated 
from the power model (9) with SNR = 0.1. The empirical success rate for a fixed 
number of measurements N tra ce is found by repeatedly randomly sampling N tra ce 
traces for an attack, and calculating the proportion of attacks that retrieves the 
correct secret key. We see that the ILAio correctly predicts the two different 
CPA success rates curves (with difference about 10%), while by MIa the infor- 
mation leakage should be the same in these two cases. Note that from Fig. 2, 
the error ratio of our simplified SR formula under first-order CPAs is only 1.5 % 
when SNR = 0.1. 

In the second example, the last 6 -bits are perfectly masked. For two cases of 
~P 2 = \Pi,P 2 \ = [0.3, 0.3] and ~p 2 = [0.1, 0.5], QMS = 0.4, but ILA = 0.16 and 
0.32 respectively. Figure 3 (b) shows that ILAio correctly predicts the different 
empirical CPA success rate curves, while QMS incorrectly labels the two cases 
as equally leaky. Therefore, only ILAio correctly measures the CPA leakage in 
these examples. 

The formulas (16) and (23) give the CPA success rates using ILA and SNR. 
Figure 4 plots the number of traces N trace needed to achieve success rate of 
SR = 80%, when ILA and SNR vary. Figure 4 (a) is for the first-order CPA 
attack (16) and (b) is for the second-order CPA attack (23). As ILA increases or 
SNR increases, less traces are needed to get SR = 80%. For a fixed SNR value, 
the number of traces N traC e is inverse proportional to ILA. 

For the special case of single-bit DPA, all three metrics are monotonic func- 
tions of each other (Property 1 ). Thus, MIa and QMS can predict the DPA suc- 
cess rate through their relationship with ILA. Particularly, for DPA, ILAio = 
(1 — QMS ) 2 /2 and the N trace traces needed for DPA is inverse proportional to 
(1 — QMS) 2 . 


4.2 Experimental Results on Physical Implementations 

We next verify the prediction of success rates by ILA, and show that it also 
correctly predicts the dominance by first-order or second-order CPA leakage on 
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Fig. 3. First-order CPA attacks under two different biased masking schemes with 
SNR = 0.1 (a) with the same MIa value but different ILAio values; (b) with the 
same QMS value but different ILAio values 



o.i 


ILA 10 

(a) First-order 


0.1 0.2 0.3 0.4 0.5 

ILA 20 

(b) Second-order 


Fig. 4. The theoretical number of traces needed for SR = 80 % under first-order and 
second-order CPA. 


real physical systems. Two physical implementations of masked Keccak and AES 
algorithms are considered. The masked AES [1] is implemented on an SASEBO- 
GII board [22]. The protected Keccak implementation with secret sharing [4] is 
on the 32-bit Microblaze processor of the SASEBO-GII board. All the power 
traces are collected using a LeCroy WaveRunner 640Zi oscilloscope. 

We get several power data sets with biased masking through choosing parts 
of the fully masking data set according to biased masks distributions. The first 
two data sets are on the same AES implementation with So = 0.10, Si = 0.12 
but with different biased masks. The leakage amount on the first data set is 
ILAio = 0.338, ILA 2 o = 13.8, while the leakage amount on the second data 
set is ILAio = 0.174, ILA 2 o = 15.7 for CPA attacks. For the third data set on 
Keccak, 5 0 = 0.10, Si = 0.10, ILAio = 0.010, ILA 2 o = 0.006 for DPA attacks. 
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For these three data sets, ydTAio /ILA 20 A 1 = 1.3,0.88,2.02 respectively. 
By Property 3, the first-order attack is more effective in the first and third data 
sets, and the second-order attack is more effective in the second data set. 

Figure 5 shows the success rates of CPAs on the first two data sets for 
AES. Each figure plots four curves, the theoretical success rates for first-order 
CPA (16) and the second-order CPA (23), and two corresponding empirical suc- 
cess rate curves. The empirical success rates are close to the theoretical success 
rates. The first-order leakage and second-order leakage are ranked in the order 
predicted by Property 3. 

Figure 6 shows the success rates of CPA on the Keccak data are also as 
predicted by Eqs. (16) and (23). 



Fig. 5. The first-order CPA attack and second-order CPA attack on AES with different 
masking biases. 



Fig. 6. The first-order CPA attack and the second-order CPA attacks on Keccak data 
subset. 
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5 Conclusion 

In this work, we propose a new unified metric, ILA, to measure the information 
leakage at the early stage of cryptographic software under different power analy- 
sis attacks. It quantifies the leakage amount of algorithms with various masking 
strength to first-order or second-order power analysis attacks. Unlike existing 
metrics, ILA relates to the attack success rate on the physical implementations 
through a simple explicit formula. We demonstrate that it accurately quantifies 
the leakage amount comparing to existing metrics on both synthetic data and 
real physical implementation data. Therefore, it would be a reliable metric for 
system designers to predict the system leakage and develop better protections. 

Acknowledgments. This work is supported in part by the National Science Founda- 
tion under grants CNS-1314655 and CNS-1337854. 


Appendices 


A Derivation of ILA, QMS and MI A for DPA Model 


For the DPA model, Z is one single bit, as well as M. Under Assumption 1, 
F (Z = 0) = F (Z = 1) = 1/2. For the Boolean masking, Vo = F = Z^M. Hence 
F (Z © M = 0) = P(Z ® M = 1) = 1/2, 

F (Z ® M = 1| Z) = (1 — 2 p)Z + p = p or 1 — p. (28) 

Using Eq. (28), D X ^(F) = p or 1 — p, which implies max{| D X ^(F) — 
D x r,k'(F) |} = 1 1 — 2_p | . Hence QMS = 1 — |1 — 2p\. 

For MIa, we calculate the entropies first. 

H(K) = - £ PW log 2 p(k) = - E W l0g2 W = l0g2 Nk - 
keic keic 1 k k 


H(K\V 0 ) «- P( k )- E P( x )- E P(vo\k,x).\og 2 p(k\v 0 ,x ) 

keic ccg{o,i} v 0 e{o,i} 


= -EA- E i[piog 2 ^ + (i-p)iog 2 ^H 

keic *£{0,1} 4 4 

= log 2 N k - [1 + (1 -p)log 2 (l -p) +plog 2 p\. 


Therefore, 


MI a = H{K) - H(Vo\K) = 1 + (1 -p)log 2 (l -p) +plog 2 p. (29) 


356 L. Zhang et al. 


We will derive the ILAio and ILA 20 expressions in the CPA model in 
Appendix B. Plugging-in b = 1, we get their DPA model expressions. 

B Derivation of ILAio and ILA 2 o for CPA Model 

For the CPA model, the selection is Hamming weights Vo = H(Z 0 M), V\ = 
H(M), and both M and Z are 6-bit variables. Since P(M(q = 1) = p, i m 
1, 2, ..., 6, we have: 

E m [H(M)] = bp, E m [H(M) 2 } =bp + b(b - 1 )p 2 . (30) 

Under Assumption 1, Z has uniform distribution for any key k g so that always 

E x[H(Z)} = 6/2, E x[H(Z) 2 } = ( b 2 + b)/ 4. (31) 

Here U 0 *(X, k) = H[Z(X, k)]. Under Assumptions 1 and 3, 

E kgK (k c ,k g ) = E kg E x {[V 0 *(X,k c ) - V 0 *(X,k g )] 2 } 

= E kg E x [V 0 *(X, k c ) 2 ] + E kg E x [V 0 *(X, k g ) 2 ] - 2E kg E x [V 0 *(X, k c )V 0 *(X, k g )] 

= 2E x [V 0 *(X,k c ) 2 } - 2{E x [V 0 *(X,k c )]} 2 . 

(32) 

Using (31), this becomes 

E kg K(k c ,k g ) = 2( b ^)-2(£) = l (33) 

By the property 2 in [19], with A denoting the bit-wise multiplication, 


E m [H(Z © M)\(X, fc c )] = E m [H(Z) + H(M) - 2 H(Z A M) \(X, k c )] 
= (1 - 2 p)H(Z) + bp. 

Then for the first-order CPA, using Eqs. (34) and (33) 


(34) 


ILA^ Efc g [^i o(k c , k g )\ 

= E*, [E X {[E m (H(Z © M)\(X, k c )) - E m (H(Z © M)\(X, k g ))] 2 }] (35) 

= E* B [(l-2 pf„{k c ,k g )] = b -^^. 

Similar to (34), using (30), 

E m{[H(Z © M) - I }[H(M) - bp]\(X, k c )} 

= E m {[H(Z © M)H(M) - bpH(Z © M)]\(X, k c )} 

= E m {[H(Z)H(M) + H(M) 2 - 2 H(Z A M)H(M) - bpH(Z © M)]\Z} 

= H(Z)bp + [bp + b{b - 1 )p 2 ] -2 \p+(b- 1 )p 2 }H(Z) 

-bp[( 1 - 2 p)H(Z) + bp] 

= -2p(l-p)[H(Z)-\], 

Hence for the second-order CPA, using Eqs. (36) and (33) 

ILA 20 — [^20 (^C5 

= E kg [E x {[E m (V 0 Vi\(X, kc)) - E m(M|(I, k g ))} 2 }} 

= Efc 9 [4p 2 (l -p) 2 n(k c , k g )] = 2bp 2 {l-p) 2 . 


(36) 


( 37 ) 
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C Theorem 1: /x and U in the first-order CPA (12) 

Denote v^ nl0 = Vo(xi,k g ,m) and = Vo(xi, k c , mi). Recall that, under 
Assumption 1, E[^ ?1?0 ] = E[V£] = E[U 0 C ] and = 

Ex{[Em(^,i,o)] 2 } for any k g . Hence we have an useful expression that will 
be used later, 

Ex{EmK,i,o)[EmK,i, 0 ) - Em(^,i, 0 )]} 

= |Ex{[Em«i,o)] 2 + [Em«!,o)] 2 - 2E m « 1j 0 )E m « i1>0 )]} (38) 

= ^Ex{[EmK,i,o) - e m«,i )0 )] 2 } = \ni 0 {k c ,kg). 

For large n, o = c 0 + e 0 E(^i 5 o) and /i j0 = c 0 + eo^+o + tfl+qo, then Eq. (11) 
becomes 


A ( K, kg) = {<*oKo - E(vi,o)] + ri,o}pEM(^,i,o) - E m 04 i1>o )]. (39) 


(40) 


Since E[rqo] = 0, we have: 

Hk g = <5oE{(ui,o - E[«i,o])(Em[<,i,o] - Em[«™,i, 0 ])} 

= (5oE{v 1i 0 (E m [^ i1)0 ] -E m [< 1i0 ])} = ^ K 10 (k c ,k g ). 

The last equality uses the fact that 'Em [^ 1 , 0 ] = E m[^ i o] an d Eq. (38). 

The element in covariance U corresponding to k g i and k g j is: 

a k gi ,k gj = COV{A\°{k c ,k gi ),A\°{k c ,k gj )) = E[A\ 0 (k c ,k gi )A\°{k c ,k gj )} - »k gi »k gj - 

(41) 

Since E[r^ 0 ] = 1, keep the leading term (dropping the terms with 5 q), we have 

a k g i,k9j = h^ 1)0 ] — K ;1 ,o])} — K 10 (k c ,k g i ,k g j ) . 

(42) 


D Proof of Lemma 1 

Similar to the derivation of (38), 

^io(^c, k g i, k g j) 

= Ex{(Em[^i ) 0 ] — EM[^,l,oD(^Mbm,l,o] — ^mK',1,01)} 

= Ex{(Em[^i )0 ]) 2 — EMfcyolEMliyol 

-EmK,i,o]EmK,i,o! +EmK ) i )0 ]EmK ) i )0 ]} 

= ^Ex{(Em fc, 1)0 ] - e mKi, 0 ]) 2 + N<1 )0 ] - EmK, 1, 0 ]) 2 
-(EmK.oI-EmK,!,]) 2 } 

2 [«lo(*C, kgi) + «1Q (A) C , kgj ) AClO (^+2 , kgj )] . 


(43) 
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E Theorem 2: /jl and U in the Second-order CPA (12) 


For large sample n, Z.j = Cj + ejE[v ij], then lij = Cj + ejv ij + = 0, 1, 

where = vij — E(rqj) are the centered version of vi$ = Vq(xi, & c ,rai) and 
t’lp = Ei (ml). Similarly, let T m? i 5 o> 1 0 , and F m? i denote the centered versions 
of corresponding quantities i,o> v mioi an d %,i- We have 


A{ u (k c , k g ) = (S Q v li0 + ri ?0 )(^i^i,i + r 1:1 )(E M [v^ 1:0 Vm,i\ - e m[^ ? 1>0 ^m,i])- 


Since E[ri ? o] = Efrqi] = 0? 


(44) 


Vk g = 5 0 ^iE{vi, ovip (E m 

= 5o^lEx{EM{^l,0^1,l(EMbm, 1 ,O v rri,l] “ E M fam,l j0 *W]) }}• 

By assumption 1, E[Fi ; o^i,i] = E[^m,i,o^m,i] = ftioV]- Similar to the 
derivation of (38), 


Vk g = ^0^lEx{EM^l,0^1,l](EM[^,i, 0 ^m,l] -EmK,1,oVi])} 

= ^E X {{E M [v^ lfi Vm,l} - EMK,i,o»m,l]} 2 } ( 46 ) 

= ^-^2 o(k c ,kg). 

The element in covariance U corresponding to k g i and k g j is: 


O kgi,kgj COF(/ll(^ C , fcgi), ^l(fc C , ^gj)) E [A\ {k C: kgl)/\\ {kc') kgj )] ’ 

(47) 

Since E[rf 0 ] = E[rf x ] = 1, the leading term (dropping terms with So or <h) is , 


^kgi,kgj 

= Ex{(Em[4 i1i0 %,i] - EMK i i,o%,l])(EMfc,l,o»m,l] - EmK,i, 0 %,i])} 

= ^'20 (kc-i k'gii kgj ). 

(48) 


F Algorithms for Calculating ILAio 

Here, we describe the algorithm of computing ILAio knowing the mask distrib- 
ution. Algorithm 1 assigns the probability distribution of mask with the known 
probability for each masking bit. Algorithm 2 calculates the first-order informa- 
tion leakage amount based on this probability distribution. These algorithms are 
used to calculate the ILAio values in Sect. 4.1. 
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Algorithm 1 . Probability Distribution of Mask 
Input: Probability distribution of masking bits p 
Output: Probability distribution of mask /m 
1: Nm 4— size of key space \M\ 

2: Nbit 4 — size of byte \~p\ 

3: for m — 0 — ► iV m — 1 do 

4: /m [m] 1 

5: for i = 0 — > Nm — 1 do 

6: if m(i) = 1 then > the (i + l)th bit of m 

7: /m [m] 4— /m[ui] * > pi the (i + l)th value of ~p 

8 : end if 

9: if ra(i) a 0 then 

10: /mW 4- / M [m] * (1 -pi) 

11: end if 

12: end for 

13: end for 


Algorithm 2. Calculation of ILAio 

Input: Correct Key k c , probability distribution of mask /m, intermediate value V (a 

Nk x N x x Nm dimension matrix) 

Output: ILAio 
1 : Nk 4 — size of key space |/C| 

2: N x 4— size of plaintext (ciphertext ) |A| 

3: iV m 4 — size of mask \M\ 

4: ILAio 4- 0 

5: for k g = 0 —> Nk — 1 do 

6 : E 2 [k g \^ 0 

7: for x = 0 — 1 do 

8 : E^kgWx] +-0 

9: for m = 0 — >• N m — 1 do 

10 : Ei[k g ][x\ 4 — Ui[/c^][x] + (U[/c c ] [x] [m] * /m[tu] - U[A^][x][ra] * /m[tu]) 

11: end for 

12: E 2 [k g \ <— E 2 [k g ] + Ei[k g ][x] * Ei[k g ][x] * ^ > E 2 [k g ] = Kio(k c ,k g ) 

13: end for 

14: ILAio 4— ILAio +E 2 [k g \ * iVfc 1 _ 1 

15: end for 
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Abstract. While traditionally cryptographic algorithms have been 
designed with the black-box security in mind, they often have to deal 
with a much stronger adversary - namely, an attacker that has some 
access to the execution environment of a cryptographic algorithm. This 
can happen in such grey-box settings as physical side-channel attacks or 
digital forensics as well as due to Trojans. 

In this paper, we aim to address this challenge for symmetric- key 
cryptography. We study the security of the Advanced Encryption Stan- 
dard (AES) in the presence of explicit leakage: We let a part of the inter- 
nal secret state leak in each operation. We consider a wide spectrum 
of settings - from adversaries with limited control all the way to the 
more powerful attacks with more knowledge of the computational plat- 
form. To mount key recoveries under leakage, we develop several novel 
crypt analytic techniques such as differential bias attacks. Moreover, we 
demonstrate and quantify the effect of uncertainty and implementation 
countermeasures under such attacks: black-boxed rounds, space random- 
ization, time randomization, and dummy operations. We observe that the 
residual security of AES can be considerable, especially with uncertainty 
and basic countermeasures in place. 


Keywords: Grey-box • Side-channel attacks • Leakage • AES • Bitwise 
multiset attacks • Differential bias attacks • Malware • Mass surveillance 


1 Introduction 

1.1 Background: Black Box, Grey Box and White Box 

It is symmetric-key algorithms that are in charge of bulk data encryption and 
authentication in the field. Plenty of multiple wide-spread applications such as 
mobile networks, access control, banking, content protection, and storage encryp- 
tion often feature only symmetric-key algorithms, with no public- key cryptogra- 
phy involved. 

Traditionally, the security of symmetric-key cryptographic primitives has 
been analyzed in the black-box model, where the adversary is mainly limited 
to observing and manipulating the inputs and outputs of the algorithm, the 
related-key model [2] being a notable extension. Multiple techniques have been 
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extensively elaborated upon, such as differential and linear cryptanalysis, integral 
and algebraic attacks, to call a small subset of the crypt analytic tools available 
today. Cryptographers have excelled at preventing those by design [8]. 

In late 1990s, with the introduction of timing attacks [13] by Kocher, differ- 
ential fault analysis [1] by Boneh, DeMillo and Lipton, simple power analysis as 
well as differential power analysis [14] by Kocher, Jaffe and Jun, the research 
community has become aware of side-channel attacks that operate in the grey-box 
model: Now the attacker has access to the physical parameters of cryptographic 
implementations or can even inject faults into their execution. Numerous coun- 
termeasures have been proposed to hamper those attacks, providing a practical 
level of security in many cases. 

Since mid 2000s, a trend of side-channel analysis has been towards analyti- 
cal side-channel attacks that assume leakage of fixed values of variables instead 
of stochastic variables and whose techniques border the black-box cryptanalysis. 
So, collision attacks [22] by Shramm et al. observe an equation within one or sev- 
eral executions of an algorithm. Algebraic side-channel attacks [21] by Renauld 
and Standaert work under the assumption that the attacker can see the Ham- 
ming weight of the internal variables of an algorithm. The attacker uses the 
techniques of algebraic cryptanalysis to solve the systems of nonlinear equations 
arising from collisions and algebraic side-channel attacks [5,19,20]. Dinur and 
Shamir [9] apply integral and cube attacks to block ciphers in a setting where a 
fixed bit after a round is leaked due to physical probing, power analysis or sim- 
ilar. Also differential fault analysis uses elements of differential cryptanalysis. 

As an extreme development of the grey-box setting, the white-box model [7] 
by Chow et al. poses the assumption that the adversary has full control over 
the implementation of the cryptographic algorithm. The major goal of white- 
box cryptography is to protect the confidentiality of secret keys in such a white- 
box environment. However, all published white-box implementations of standard 
symmetric-key algorithms such as AES to date have been practically broken in 
this model [18]. The white-box setting may be too strong for standard symmetric- 
key algorithms such as AES, because such a cipher was designed with the black- 
box security in mind. 

1.2 Leakage and AES 

In this paper, we enhance the Dinur-Shamir setting [9] and aim to bridge the 
gap between the physical side-channel attacks, the techniques of provable leakage 
resilience [17] and white-box setting (dealing with attackers too hard to protect 
against). Namely, we let the AES implementation leak some information during 
its execution which is defined as follows. 

Definition 1 (Leakage Model). A malicious agent leaks a part of the interme- 
diate internal secret state (including the key state) of a cryptographic algorithm 
in each algorithm execution. 

To apply this setting to AES (we will talk about AES- 128 most of the time), we 
make it more concrete and fix several important parameters of the leak: 
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Frequency: There is a single leak per encryption/decryption. This simplifies com- 
plexity estimations in our analysis. If more leaks are available in each execu- 
tion, the complexities can be adjusted accordingly. 

Granularity : A leak can only happen after a full round. This situation corre- 
sponds e.g., to a 32-bit serial or round-based hardware implementation of 
AES or a software implementation using an instruction set extension such as 
AES-NI available on most Intel/AMD CPUs or the Cryptography Extension 
on ARMv8. 

Knowledge: The attacker does not have any knowledge of the location of leaked 
bits, i.e., it does not know the bit position and the number of round of 
leaked bits. He also does not know whether the leak is from the key schedule 
or data processing part. This circumstance models the limited control of the 
adversary over the platform. 

We let several parameters vary in our analysis 1 : 

Time and space: The location of the leak in terms of the round number (time) 
and bit position within the round (space) can either be fixed or vary. 

Known/chosen plaintext/ ciphertext: We consider both known and chosen text 
models. In case of a passive attacker, we talk about the known text setting. 
Otherwise, the attacker is allowed to choose text. 

Alignment: We consider single-bit leaks, byte leaks and multiple-bit leaks. While 
single-bit leaks are more likely to happen due to physical probing, byte leaks 
correspond more to software settings. 


1.3 Our Contributions 

The contributions of this paper are as follows. The cryptanalytic results are also 
summarized in Table 1 . 

AES Under Basic Leakage and Bitwise Multiset Attacks. We develop a 
bitwise multiset attack, which exploits relations of sets of plaintexts and internal 
states, to evaluate the security of AES if the time and space of the leakage is 
fixed. Our attack utilizes a bitwise multiset characteristic which is an extension 
of Dinur-Shamir integral attacks [9]. Unlike their attacks, our attack is feasible 
even if an attacker does not have any knowledge of the location of leaked bits. 
See Sect. 2 and Table 1 for the details. 

1 Further models are worth consideration as well. For instance, the Dinur-Shamir 
model of the side-channel cube attacks [9] can be seen as a special case of our 
leakage model, with the following differences: First, in the Dinur-Shamir model, the 
adversary knows the location of the leak. Second, the Dinur-Shamir model does not 
consider leaks of more that a single bit. Third, Dinur-Shamir do not allow for leaks 
from the key schedule. Finally, the time and location of a leak are fixed, while we 
allow for time and space uncertainty in our consideration. 
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Fig. 1 . AES with space and/or time uncertainty 


AES Under Leakage with Space/Time Uncertainty and Differential 
Bias Attacks. We let time, space or both be randomized. The space random- 
ization makes the position of leaked bits random in each execution. The time 
randomization makes the round number of leaked bits random in each execution. 
A combination of time and space randomization is also an advanced model we 
consider. See Fig. 1 for an illustration. 

This setting takes account of a more realistic environment, such as the lack 
of knowledge of the implementation, and the presence of countermeasures. Here, 
our multiset attacks are infeasible, as no clean multiset is available. To cope 
with that, we develop a differential bias attack and a biased state attack inspired 
by techniques for distinguishing attacks against stream ciphers [15,16,23]. More 
specifically, by properly choosing differences and values of plaintexts, we create 
biased (differential) states , where the distribution of bitwise differences or value 
is strongly biased only if the key is correctly guessed. Thus, we are able to dis- 
tinguish the leak corresponding to the correct key. See Sect. 3 for the techniques 
as well as Sect. 4 and Table 1 for the results. 

AES Under Noisy Leakage. We consider leakage with noise, where the attacker 
does not know exactly if the variable it accesses corresponds to the execution of 
the algorithm under attack. For example, it can be the case if multiple instances of 
encryption (with different keys) are run simultaneously or if the implementation 
uses dummy operations to hide the AES execution. The differential bias attack 
remains applicable in this setting, with adjusted complexities. See Sect. 6 and 
Table 1 for details. To characterize noise, we define i r to be the probability that 
the leak is correctly read. The complexities of our attacks grow quadratically with 
the increase of 1/tt. 

Further Results. We discuss the applicability of our attacks to AES- 192 and 
AES-256, multiple-bit leakage, and other granularities of the leaks in Sect. 8. 
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Table 1 . Security of AES- 128 under leakage in various settings 
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*1 : 32-bit partial key recovery attack, *2 : 8-bit partial key recovery attack 
BB round(s): Black-boxed round(s), KP: Known Plaintext, CP: Chosen Plaintext 
CC: Chosen Ciphertext, MA: Multiset Attack, DBA: Differential Bias Attack 
BSA: Biased State Attack, n is the probability to read a correct leak 


Our Observations and Recommendations. To summarize the residual secu- 
rity of AES under leakage in the various settings, we observe the following. First, 
if no rounds are black-boxed and all intermediate internal states can be visible 
to the attacker, there are practical attacks, even with uncertain time and space. 
Second, to approach practical infeasibility of attacks in our leakage model with- 
out black-boxing, a substantial level of noise are be needed, n = 2“ 10 and lower 
when combined with randomized time and space. 

On the other hand, the black-boxing of round 9 is very effective. Indeed, if 
round 9 is black-boxed 2 (i.e., when the state between round 9 and round 10 is 
invisible to the attacker), the complexities of our attacks grow beyond 2 44 even 
with fixed time and space. Third, if uncertainty in time and space is combined 
with the black-boxed 9th round, our attacks require more than 2 58 operations, 
even with clean leaks. Then, if more rounds (1,2,8, and 9) are black-boxed, the 
complexities increase to 2 72 . If noise is applied as countermeasure on top of that, 
it is possible to attain security levels of 2 80 and beyond against our attacks. 

Thus, black-boxed round 9, noise or both are needed to hamper our attacks 
at a practical security level under leakage. Note that a high-budget organization 

2 E.g., partly unrolled hardware implementations aimed to reduce latency [6] may 
have this property. 
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can practically afford an attack of complexity 2 80 and higher [12]. However, the 
countermeasures considered here may still be effective against a mass surveillance 
attacker. 

2 Preliminaries 

This section fixes AES notations that we will use throughout the paper and 
describes the leakage attack by Dinur- Shamir on AES as a starting point. 


2.1 Notations of AES 


AES is a block cipher with a 128-bit internal state and a 128/192/256-bit key 
A, referred to as AES- 128, AES- 192 and AES-256, respectively. In most parts of 
this paper, we refer to AES-128 whenever speaking of AES. The internal state is 
represented by a 4 x 4 byte matrix, and the key is represented bya4x4/4x6/4x8 
matrix. For example, a 4 x 4 internal state consisting of 16 byte cells is expressed 
as follows. 


S = 


50 S4 $8 512 

51 S 5 Sg S13 

52 56 5io 5i4 

53 57 Sn S15 


AES consists of a data processing part and a key schedule. The data processing 
part adopts a substitution-permutation network whose round function consists 
of four layers: SubBytes, ShiftRow, MixColumns and AddRoundKey. SubBytes is 
a nonlinear transformation applying a 8-bit S-box to each cell. ShiftRow rotates 
bytes in row r by r positions to the left. MixColumns is a linear transforma- 
tion applying a 4 x 4 diffusion matrix with branch number 5 to each column. 
AddRoundKey adds a 128-bit subkey to a 128-bit state by an XOR operation. 
Note that AddRoundKey is also performed before the first round as whitening 
and that MixColumns is omitted in the last round. Subkeys are generated by a 
key schedule. For the details of the key schedule of AES, we refer to [11]. 

Two types of internal states in each round of AES- 128 are defined as follows: 
#1 is the state before SubBytes in round 1, #2 is the state after MixColumns 
in round 1, #3 is the state before SubBytes in round 2,. . ., #19 is the state 
before SubBytes in round 10, and #20 is the state after ShiftRow in round 10 
(MixColumns is omitted in the last round). The states in the last round of AES- 
192 are addressed as #23 and #24, and of AES-256 as #27 and #28. We let 
#0 be a plaintext and #21, #25 and #29 be a ciphertext in AES-128/192/256, 
respectively. 128-bit subkeys are denoted as $0 , $1, . . ., and so on. The i-th byte 
in the state x is denoted as Xi and the j- th bit in Xi is represented as Xi\j]. 

2.2 Dinur-Shamir Chosen-Plaintext Attack on AES-128 with 
Leakage 

As a starting point of our analysis, we outline the leakage attack proposed by 
Dinur and Shamir in [9]. As explained above, the Dinur-Shamir model is differ- 
ent from our leakage models as the adversary knows the time (round number) 
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and space (bit position inside the round) of the leak, only single-bit leaks are 
considered there, and no leaks from the key schedule are allowed. 

In the attack of [9], one uses the following multiset properties of a byte: In set 

A, all 2 8 values appear exactly once; In set C, all 2 8 values are fixed to a constant; 
In set B, the XOR sum of all 2 8 values is zero; In set U, all 2 8 values is not A, C or 

B. Let an TV-round attack be an attack based on leaked bits after the N - th round 
function, e.g., a 2-round attack is based on only leaked bits of #5. 

In the first step, the attacker guesses 4 bytes of the key $0, and chooses a set 
of 2 8 plaintexts, so that #2 consists of H-set in which only one byte is A and the 
other 15 bytes are C. If 4 bytes of $0 are correctly guessed, #5 consists of 4 bytes 
of A and 12 bytes of C, while in a wrong key, all bytes in #5 become U. Thus, 
by checking whether the all 2 8 values of #5 are fixed, an attacker is able to sieve 
wrong keys after 2 32 operations. The procedure can be repeated for three times 
with the three 4-byte sets of the key $0 depending on the position of the leaked 
bit. The remaining 4 bytes of $0 are exhaustively searched. Time complexity is 
estimated as 2 42 (« (2 32 x 2 8 x 3)) encryptions and the required data is 2 34 
(= 2 32 x 4) chosen plaintexts. The work [9] also proposes other types of 2-round 
attack using cubes, with a time complexity of 2 35 . However, the details are not 
given. 

The paper [9] mentioned that 3- and 4-round attacks were possible by using 
similar techniques but omitted the details. As A expands into all state after 3 
rounds even if the key is correctly guessed, at least the 2-round attack has a 
limited application to 3 and 4 rounds. 

3 AES Under Leakage with Fixed Time and Space 

In this section, we present new key recovery attacks on AES under leakage with 
fixed time and space. That is, a bit of the internal state is leaked whose location 
(round and bit position) is unknown but fixed for the entire attack. Our attack is 
an extension of the Dinur-Shamir integral attacks [9] . While their attack requires 
the location of leaked bits in advance, our attack is feasible even if an attacker 
does not have any knowledge of it. First, we describe a technique to detect 
whether leaked bits come from the key schedule or the data transformation, and 
show that leaked bits from the key schedule are of very limited use for a key 
recovery attack in this setting. Then we introduce key recovery attacks based on 
leaked bits from the data transformation. Our attacks utilize a bitwise multiset 
characteristic. 

Formalization of Fixed Time and Space. The fixed (unknown) location 
setting assumes that each execution of encryption leaks only one bit of the 
internal state at the fixed location. Specifically, leaked bits are assumed to 
come from internal states after each round function of the data processing part: 
#3, #5, . . . , #19 or each state of the key schedule (i.e., subkeys): $0 , $1, . . ., $10 
at the fixed position of the fixed rounds in each encryption, e.g., #9n [2] or $5s[5]. 
The adversary is able to access the encryption function with known/chosen plain- 
texts/ciphertexts and obtain corresponding leaked bits. 
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Leakage From Key Schedule. The states in the key schedule, $0 , $1, . . 
$10, are deterministic with respect to the value of the key, i.e., if a key is fixed, 
all states of the key schedule are fixed independently of the values of plaintexts. 
On the other hand, the states in the data processing part depend on the values 
of plaintexts. This difference allows us to detect whether leaked bits come from 
the key schedule. More specifically, we encrypt N different plaintexts and obtain 
N leaked bits. If all N bits are the same, they come from the key schedule with 
probability (1 — 2~ N ). 

If the leaked bits come from the key schedule, information theoretically, the 
attacker is able to get at most one bit of the subkey information, as each encryp- 
tion leaks the same state information at the fixed location. In addition, since an 
attacker does not know where leaked bits come from, leaked information from 
the leaked bits is negligible. Therefore, we will focus on the case where leaked 
bits come from the data processing part in the following. 


3.1 Bitwise Multiset Characteristic 

Our attacks utilize the following bitwise multiset property in the data transform. 


Proposition 1 (Bitwise Zero-Sum Property). If only one byte of 2 is A 
and the other 15 bytes are C (A set), the bitwise XOR-sum of 2 8 multiset of any 
bits in 7^3 to 7^10 is zero. 

Proof As shown in Fig. 2, if #2 consists of a A set, #3 is also a A set, and #5 
consists of 4 bytes of A and 12 bytes of C. Then, #7 and #9 consist of 16 bytes 
of A and B, respectively. In the 2 8 multiset of each bit of A, C and B, the XOR 
sum becomes zero [4]. □ 
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Fig. 2. Bitwise multiset characteristics over 4-round AES- 128 
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3.2 Chosen-Plaintext Bitwise Multiset Attack 

The bitwise zero-sum property allows us to develop chosen-plaintext key recovery 
attacks using leaked bits at a fixed position in #3, #5, #7 or #9. Our attack 
firstly guesses 4 bytes of the key $0, and chooses a set of 2 8 plaintexts resulting 
in A set in 7^2. If 4 bytes of $0 are correctly guessed, the bitwise XOR sum of 2 8 
leaked bits in any bit position of #3 to 7^10 is zero (Proposition 1). Otherwise, 
the probability that the bitwise XOR sum of leaked bits of #5, #7 and #9 is 
zero is 2 _1 . If this procedure repeats with N different sets of 2 8 plaintexts, wrong 
keys can be detected with a probability of (1 — 2~ N ). 

First, we prepare a table of 2 32 plaintexts in which all values of #Oo, #0s, 
#Oio, #0i5 appear once and the other 12 bytes are fixed, and corresponding 
leaked bits. Assuming that the leaked bits can come from any position of 7^5, 
#7 or #9, our attack is performed as follows: 

1. Guess $Oo, $05, $Oio, $0i5 (4 bytes ) and choose # 2i, #2 2 , #2 3 (3 bytes). 

2. Compute 2 8 the 4 bytes of #Oo, #05, #Oio, #0i5 backwards with all 2 8 values 
of#2 0 . 

3. Get 2 8 leaked bits by accessing the prepared table, and compute the XOR 
sum of 2 8 leaked bits. 

4. Repeat steps 1 to 3 A times with different values of #2i, #2 2 , #2 3 . If all N 
sets of XOR-sums are zero, regard it as a correct key. 

5. Repeat steps 1 to 4 with all 2 32 key candidates for $Oo, $0s, $Oio, $0is. 

6. Repeat steps 1 to 5 for three times with the other three 4-byte sets of the key 
$0 and corresponding bitwise multiset characteristics and tables. 

The number of surviving keys after the above procedure is estimated as (1 + 2~ N x 
(2 32 — l)) 4 . If the remaining key candidates are exhaustively searched, time com- 
plexity is estimated as {(2 32 x 2 8 x N) x 4} + (1 + 2~ N x (2 32 — l)) 4 encryptions. 
When N = 22, the time complexity is estimated as 2 46,46 encryptions, the required 
data is 2 34 (= 2 32 x 4) chosen plaintexts and the required memory is 2 34 bits. This 
attack is successful if leaked bits come from any bits of #5, #7 and #9 without 
any knowledge of the location of leaked bits. 

3.3 Partial Key Recovery Attack Using Leaked Bits from #3 

If leaked bits come from #3, a 32-bit partial key-recovery attack is feasible as 
AES takes 2 rounds to achieve the full diffusion. If 4 bytes of keys $0 are guessed 
correctly, 2 8 multiset in only one byte of 7^3 is not C as shown in Fig. 2, while 
for a wrong key, 2 8 multisets in 4 bytes of one column are not C. We exploit the 
gap of the number of C in 7^ 3 between a correct key and a wrong key. 

We guess the column in #3 where leaked bits come from and then guess 
corresponding 4 bytes of $0. We check whether the 2 8 multiset of leaked bits 
is fixed with N different sets of 2 8 plaintexts. A correct key can be detected 
with probability of (1 — 2~ 8N ) if leaked bits come from the byte which is C for 
a correct key and B for a wrong key. We repeat this 4x4/3 times by guessing 
different columns and the byte position of leaked bits in #3 and corresponding 
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4 bytes of $0. The corresponding 32 bits of the key $0 can be recovered with 
about 2 44 (« 2 32 x 2 8 x 4 x 4 x 4/3) encryptions when TV = 4, 2 34 chosen 
ciphertexts and 2 34 memory. 


3.4 Chosen-Ciphertext Bitwise Multiset Attack 

In the chosen-ciphertext setting, backward direction attacks are feasible by using 
leaked bits from #13, #15, #17 or #19. As shown in Fig. 3, if 4 bytes of $10 are 
correctly guessed and a set of ciphertexts is properly chosen, the XOR-sum of 2 8 
multiset of any bit in #12 to #17 is zero (Proposition 1). Since states #13, #15 
and #17 correspond to #7, #5 and #3, respectively, chosen-ciphertext attacks 
using these bits are feasible in the same manner as for chosen-plaintext attacks. 

Also, #19 is affected by only one byte of $10. Thus, one byte of $10 can be 
recovered by the exhaustive search with 8 leaked bits from different ciphertexts 
after guessing 128 positions of the leaked bit. Time complexity is estimated as 
2 18 (= 2 8 x 128 x 8) encryptions, the required data is about 2 8 known ciphertexts, 
and the memory consumption is negligible. 
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Fig. 3. Bitwise multiset characteristics in 4-round AES- 128 in backward direction 


3.5 Combined Key Recovery Attacks on AES 

Finally, we introduce a key recovery attack on the full AES- 128 by combining 
the forward and the backward direction attacks. Since we do not know in which 
round the bits leak, we guess it and then mount each round attack in the following 
order: #19 — ► #17 — > #3 — > #5 — > #7 — > #9 — > #13 — > #15, i.e., if a correct 
key is not found by the guessed-round attack, the next round attack is applied 
in that given order. Our attacks find a correct key successfully except the case 
where the leaked bits come from #11. Thus the success probability without any 
knowledge of locations of leaked bits is 0.899 (= 8/9). 
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Time complexity is estimated as 2 48 (« 2 18 + 2 44 + 2 44 + 2 46,46 + 2 46,46 ) 
encryptions. The required data is about 2 35 (= 2 34 + 2 34 ) chosen plaintexts and 
2 34 chosen ciphertexts and the required memory is 2 34 bits. Note that if the 
leaked bits come from #3, #17, #19, partial key recovery attacks are possible. 

4 Uncertainty and Differential Bias Attacks 

The attacker can also have limited control over the execution environment. 
In particular, the time and space can be uncertain. We assume now that the 
attacker does not know bit positions and/or the number of rounds of leaked bits. 
Moreover, the values leaked can be incorrect due to noise or other operations 
executed in parallel to encryption/decryption. This can happen both for purely 
technical reasons on a complex multi-process platform and due to countermea- 
sures. This section deals with these uncertainties and develops a cryptanalytic 
technique that is coined differential bias attack. 

In a nutshell, the technique works as follows. Let Z{ be a leaked bit from 
an i-th execution of the encryption function. Our attacks observe a stream of 
leaked bits , Z\ , Z 2 , Z% , . . . and recover the correct key by applying techniques 
of distinguishing attacks from the domain of stream ciphers [15,16,23]. More 
specifically, we guess a part of the key $0, and set well-chosen differences for a 
pair of plaintexts resulting in biased differential states , where the distribution of 
bitwise differences is biased, if the part of key $0 is correctly guessed. As a leaked 
bit stream from biased differential states is also biased, we are able to detect the 
bit stream corresponding to the correct key by checking bias on the differences 
of bits. Also, if leakage after round 9 is available, a more powerful attack, called 
biased state attack , is feasible by using similar techniques. 

Formalization of Uncertain Time and Space. We assume a random unknown 
round (time) and/or bit position (space) within the round of the leak. Again, each 
execution of encryptions leaks only one bit of internal states at the random loca- 
tion. More formally, leaked bits are assumed to randomly come from the target 
space of internal states. For example, if the target space consists of all states after 
each round function of the data transform and key schedule, it is the leakage from 
states #0, #3, . . . , #19, #21 and states $0, $1, . . ., $10. A target space can be a 
subset of those states if some rounds are black-boxed (and, thus, not visible to the 
attacker). 


4.1 Truncated Differential Characteristic 

Our attacks utilize a bytewise truncated differential characteristic of Fig. 4, where 
a colored-cell is a probability-one non-zero truncated difference, a blank cell is a 
probability-one zero truncated difference, and ? is an unknown truncated differ- 
ence. Define 4 bytes of differences {Z\#Oo, Z\# 05 , Z\#Oio, ^#0is} in a pair 
of plaintexts as (Z\#0 o , /i#0 5 , /i#0io, ^#0i 5 ) = S~i(MC~ 1 (A#2 0 , 0,0,0)), 
where S' -1 and MC _1 are the inverses of SubBytes and MixColumns in a column, 
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Fig. 4. Truncated differential characteristic over 3-round AES- 128 

respectively, and Z\# 2o is an arbitrary byte difference in #2o. Given {A# 2o, 
#2 0 ,...,#2 3 } and {$0 0 , $0 5 , $0 10 , $0 15 }, {A#0 0 , A#0 5 , A#0 10 , A#0 lb } and 
{#Oo, #05, #Oio, #015} are determined. Let #0' be a plaintext having differ- 
ences {A# 0 0 , A# 0 5 , Z\#Oi 0 , A# 0i 5 }, i.e., #0 q = #0 0 ® 4A#Oo,#0' 5 = #0 5 ® 
^#0s, #Oi 0 = #Oio ® ^#Oi 0 , #0i 5 = #0i 5 © /\#0i 5 . Also, let #'l, . . . , #'21 
be the corresponding states of #0', and Z# Z{, Z# Z 3 , . . . be leaked bits of each 
execution of #0'. 


4.2 Biased Differential State 

Choosing 4-byte differences {Z\# Oo, O5, Z\#Oio, ^#0is} properly and guess- 

ing the 4 bytes of {$Oo, $0s, $Oio, $0i5 } correctly, we are able to create biased 
differential states in #3: consisting of 15 bytes of probability-one zero differences 
and 1 byte of a probability-one non-zero difference, #5: consisting of 12 bytes 
of probability-one zero differences and 4 bytes of probability-one non-zero dif- 
ferences, and #7: consisting of 16 bytes of probability-one non-zero differences. 
As shown in Fig. 4, a correct key has 27 bytes of probability-one zero differences 
# 3i, . . . , # 3 i 5 and # 54 , . . . , #5is and 21 bytes of probability-one non-zero dif- 
ferences #3i, #5o, . . . , #5 3 , and #7o, . . . , # 7i5, while a wrong key has only 12 
bytes of probability-one zero differences #34 , . . . , #3is and does not have any 
probability-one non-zero difference in the state of the data processing part. 

In addition, a pair of plaintexts has 12 bytes of probability-one zero differ- 
ences and 4 bytes of probability-one non-zero differences for both a correct key 
and a wrong key. Also, the key schedule has 176 (= 16x 11) bytes of probability- 
one zero differences, as the subkeys are always fixed under the same key. 


4.3 Bitwise Differential Bias in Biased Differential State 

For a probability-one zero/non-zero truncated difference, we derive positive and 
negative bitwise differential biases. Our attack exploits the gap of the number 
of positive and negative biases between a correct key and a wrong key when a 
pair of #0 and #'0 is encrypted. 
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Positive Bitwise Bias for Probability-One Zero Truncated Difference. 

If a bytewise pair #x y and #x' y has a probability-one zero truncated differ- 
ence, a bitwise difference at the same position is also zero with probability one: 
Pr(A[#x y \j\, #'x y [j]] = 0) = 1, 0 < j < 7, where A[a, b] = a ® b. A correct key 
has 1720 (= 27 x 8 + 176 x 8 + 12 x 8) positive bitwise differential biases, while 
a wrong key has only 1600 (= 12 x 8 + 176 x 8 + 12 x 8) such biases. 

Negative Bitwise Bias for Probability-One Non-zero Truncated 
Difference. If a pair x y and i^x f y has a probability-one non-zero truncated dif- 
ference, the probability that a bitwise difference at the same bit position is zero is 
estimated as follows: Pr(A[#x y \j\, #'x y \j)\ = 0) = 127/255 = 1/2 • (1 — 2 -7,99 ) 
In experiments with 2 40 randomly-chosen plaintexts and keys, we confirmed that 
these negative biases toward zero exist in each bit of the probability-one non-zero 
truncated difference, where the experimental value is Pr(A[#7i\j],# f 7i\j]] = 
0) = 1/2 • (1 - 2- 7 - 92 ). 

A correct key has 200 (= 21 x8 + 4x 8) negative bitwise differential biases, 
while a wrong key has 32 (= 0 + 4 x 8) ones. The summary of bitwise posi- 
tive/negative differential biases for the truncated differential of Fig. 4 is shown 
in Table 2. 


Table 2. Bitwise differential biases for truncated differential of Fig. 4 



Positive biases toward zero 

Negative biases toward zero 

Correct key 

#3 i\j] (1 <i < 15,0 < j < 7) 

#5ib1 (4<i<15,0<j<7) 

#3ob1 (0 < j < 7) 

#5ib1 (0 < * < 3, 0 < j < 7) 
#7ib1 (0 < * < 15, 0 < j < 7) 

Wrong key 

#3i[i] (4<i<15,0<j<7) 

- 

Both keys 

#0ib1 (* 7^ 0, 5, 9, 15 < j < 7) 

$Xib'](0 < x < 10, 1 < i < 15, 0 < j < 7) 

#0ib‘] (* = 0,5,9, 15,0 < b < 7) 


4.4 Bitwise Differential Biases in the Stream of Leaked Bits 

Suppose that values of the other bits of the states in the data processing part 
and the key schedule are randomly distributed, i.e., the probability that differ- 
ences of other bitwise pairs become zero is 2 _1 . Let N a u, Nbi as , 7V^ aSn , and 
Nrandom be the number of bitwise pairs in entire space, positive biased space 
(toward zero), negative biased space (toward zero) and randomly-distributed 
space, respectively, and x c and x w be those of a correct key and a wrong key, 
respectively (see Fig. 5). The probabilities that a difference of a bitwise pair of 
randomly-chosen leaked bits is zero (A[Zi, Z'-\ = 0) for a correct key and a wrong 
key are estimated as follows: 

Pr\A\Z u Z’ 3 \ = 0) = 1/2 • (N^ andorn /Naii )+Nbias n /N a ii •( 127/255) +N§ iaSp /N a u , 
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P^iAlZ^Z'] = 0 ) = 1/2 iNZ ndo jN an )+N% a JN an im/2tt)+NZ as JN an . 


Our attack observes leaked bits Zo, Zi, Z2, Z3, . . . and Zq 5 Z{, Z2, Z3, . . ., and 
then computes the probability of Z\[Z*, Z'] =0 in order to distinguish a stream 
coming from the distribution for a correct key from streams coming from the 
distribution for a wrong key. 

The number of required samples for distinguishing the two distributions with 
probability of 1 — a is given by the following lemmata. 

Lemma 1. [15,16] Let X and Y be two distributions and suppose that the inde- 
pendent events E occur with probabilities Prx(E) = p in X and Pry{E) = 
(1 + q) • p in Y. Then the discrimination D of the distributions is p • q 2 . 

Lemma 2. [15] The number of samples N sarnp i e that is required for distinguish- 
ing two distributions that have discrimination D with success probability 1 — a 


is (1/D) ■ (1 - 2a) ■ log 2 ^ 


Assuming that the target event E is A[Z i: Z'] =0 , X is the distribution for 
a wrong key, and Y is the distribution for a correct key, p and q are estimated 
as p = Pr w (A[Zi , Z'-] = 0) and 



N sample = (pq 2 ) 1 • (1 - 2 • 2 32 ) • log 2 2 _ 32 « 2 • 32 • q 2 = 2 6 • q 2 . 


4.5 Chosen-Plaintext Differential Bias Attack 

First, this attack prepares 2 32 chosen plaintexts in which all 2 32 values of #Oo, 
# 05 , #Oio, # 0 i 5 appear once and the other 12 bytes are fixed, and obtains N s 
leaked bits in each plaintext, i.e., each plaintext is encrypted N s times. Given a 
pair of P and P', TV 2 (= N s x N s ) pairs of leaked bits are obtained as shown 
in Fig. 6 . After we make a table of the values of {#Oo, # 0 s, #Oio, # 015 } and 
corresponding N s leaked bits, our attack is performed as follows: 



p 


Wrong key 


Correct key 



Bitwise pairs of leaked bits 
ZfWZ) 


Bitwise pairs of leaked bits 
Z ; © Z’j 


Fig. 5. Bias in leaked stream 


Fig. 6. Bitwise pairs of leaked bits 
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1. Guess the 4 bytes of key $Oo, $0s, $Oio, $0is, and choose Z\#2 0 , #2 0 , . . . , #2 3 . 

2. Compute a pair of 4 bytes of plaintexts, #Oo, #0s, #Oio, # 0 i 5 and #0 q, #0' 5 , 
#0]_ 0 , #0' 15 , resulting in biased #3, #5 and #7 states if a key is correctly 
guessed. 

3. Get TV 2 pairs of leaked bits A[Z i: Z'], 0 < i, j < 7V S by accessing the prepared 
table. 

4. Repeat steps 2-3 N sarnp i e /N 2 times with different values of #2. 

5. Check whether a distribution of N sarnp i e pairs is the one for a correct key. If 
so, regard it as a candidate for the correct key. 

6. Repeat steps 1 to 5 with all 2 32 candidates of keys $Oo, $ 05 , $Oio, $0i5. 

7. Repeat steps 1 to 6 for three times with the other three 4-byte sets of the 
key $0, corresponding truncated differential characteristics, and the tables of 
plaintexts and leaked bits. 

In steps 3 to 5, we check N sarnp i e pairs to detect a stream coming from 
the biased distribution for a correct key. In the step 3, we count the number 
of the events A[Z*,Z'-] = 0, and estimate the probability Pr(A[Zi, Z'-] = 0). 
The straight forward method requires TV 2 operations to check all N 2 pairs. 
To improve it, we first calculate the number of Z^ — 0, 0 < i < N s , defined 
as N zero . Then the number of A[Z { , Z'-\ =0 is estimated as 

( Nzero x (Zq + • • • , +Z r Ns - 1) + ((AT S — N zero ) X (Zq + • • • , +Z’^ v - g _ 1 ))/iV a //, 

where a is the complement of a. These costs are estimated as N s + ( N s + N s ) 
additions and multiplications. It is assumed to be less than N s one-round encryp- 
tions. The number of surviving keys after the above procedure is estimated as 
(1 + 2~ a x (2 32 — l)) 4 . If the remaining key candidates are exhaustively searched, 
the entire time complexity is estimated as (2 32 x 4 x N sarnp i e /N s x 1/10) + (1 + 
2 -32 x (2 32 — l)) 4 ~ 2 31 x N sarnp i e /N s encryptions and the required data is 
2 34 x N s (= 4 x 2 32 x N s ) chosen plaintexts with leaked bits. The memory 
requirement is 2 34 x N s bits. 


4.6 Chosen-Ciphertext Differential Bias Attack 

If the decryption function is accessible, chosen-ciphertext attacks are applicable. 
Similarly to the setting of bitwise mutiset attacks before, the chosen-ciphertext 
attacks are more efficient and it makes sense to black-box the output of round 
9 also in the cases with time and space uncertainty. 

As shown in Fig. 7, the states #13, #15 and #17 correspond to #7, #5 
and #3, respectively. Since the state #19 consists of 12 probability-one zero 
truncated differences and 4 probability-one non-zero truncated differences, both 
for a correct key and a wrong key, one additionally has 96 positive and 32 negative 
bitwise differential biases in the chosen-ciphertext attack. 

Biased State Attack of #19: Leakage After Round 9. If leaked bits from 
#19 are obtained, a more powerful attack is feasible. Each byte in #19 can 
be controlled by one byte of $10 and one byte of a ciphertext. Thus, we are 
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Fig. 7. Differential characteristic over 4-round AES- 128 in backward direction 

able to create a biased state in #19 whose one byte (8 bits) is fixed to 0, if the 
corresponding byte of $10 is correctly guessed and the respective byte of the 
ciphertext is property chosen. Suppose that the values of the other bits of the 
states are randomly distributed. The probabilities that each leaked bit is zero 
(Zi = 0) for a correct key is estimated as Pr c (Zi = 0) = 1/2 • (Nf andorn /N' all ) + 
N bias J Kin where Km N bias p > and N random are the numbers of bits in entire 
space, positive biased space and randomly-distributed space, respectively. Also, 
Pr w (Zi = 0) is assumed to be 1/2. 

Assuming that the target event E is Z{ = 0, p and q are estimated as p = 1/2 
and q = /N^u. For the success rate of 1 — 2 -8 (a = 2 -8 ), the sample 

requirement is estimated as N' sarnple « 2 • 8 • (q)~ 2 =2 4 • (g) -2 . We repeat the 
procedure for all 16 bytes of $10. Therefore, time complexity is estimated as 
^ ^ ^ sample (= 16 x 2 8 x N' sample ) encryptions and the required data is 2 12 x 

N 'sample ( = 16 x 2 8 x ^sample) chosen ciphertexts. The memory requirement is 
negligible. 

4.7 Known-Plaintext Differential Bias Attack 

Finally, we introduce a known-plaintext differential bias attack using a truncated 
differential characteristic of Fig. 8. For a correct key, one has 24 (= 3 x 8) positive 
bitwise differential biases toward zero and 8 negative bitwise differential biases 
in #3, while for a wrong key, there are not such biases. The key schedule has 
the same number of positive biases of chosen-plaintext attacks and the plaintext 
has 32 (= 4 x 8) negative biases in both of a correct and a wrong key. 

This attack prepares 2 33 known plaintexts and makes a table of #Oo, #0s, 
#Oio, #0i5 and the corresponding N s leaked bits. The expected number of the 
entries of each value of #Oo, # 05 , #Oio, # 0 i 5 is more than 1. We mount key 
recovery attacks for $Oo, $0s, $Oio, $ 0 i 5 in the same manner as in the chosen- 
plaintext attack. In step 3, the prepared table contains the corresponding val- 
ues of #Oo, # 05 , #Oio, #0i5 with high probability. Thus, time complexity is 
estimated as 2 31 x N samp i e /N s encryptions and the required data is 2 35 x N s 
(= 4 x 2 33 x N s ) known plaintexts with leaked bits and the required memory is 
about 2 35 x N s bits. 
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Fig. 8. Differential characteristic over 3-round AES- 128 for known plaintext attack 


5 AES Under Leakage with Uncertainty in Time/Space 

This section evaluates the security of AES if the attacker is uncertain about time 
and space, that is, if the round of leak and/or the bit position of leak within the 
round are randomized. Since the multiset of leaked bits at the fixed location is 
not available in the random unknown setting, our bitwise multiset attacks are not 
applicable to these variants. Thus, we estimate the costs of differential (state) bias 
attacks on each variant of AES with countermeasures as shown in Fig. 1 . 

Formalization of Time/ Space Uncertainty for AES. We speak of random- 
ized time , when one bit of the state information is leaked at a fixed bit position 
after a random number of rounds, e.g., #( 2 x + l)io[7] (0 < x < 10 ) or $£ 3 [4] 
(0 < x < 10). We speak of randomized space , when one bit of the state infor- 
mation is leaked at a random bit position after a fixed number of rounds, e.g., 
{#17 i[j\, $ 8 * [ 7 ]} (0 < i < 15, 0 < j < 7). Randomized time and space occur, 
when one bit of state information is leaked at a random bit position after a 
random number of rounds, e.g., #( 2 x + 1 )i\j] (0 < x < 10 , 0 < i < 15 and 
0 < j < 7) or $Xi\j] (0 < x < 10, 0 < i < 15 and 0 < j < 7). 


5.1 Uncertainty in Space 

The space randomization makes the bit position of leaked bits random in each 
execution, i.e., Z{ and Z[ randomly come from two 256-bit spaces consisting of a 
128-bit state in the data processing part and a 128-bit state in the key schedule 
at the unknown fixed round, assuming encryptions are executed with a 256-bit 
working memory for a internal state and a subkey. 

Assuming that leaked bits come from the states after round 2 , i.e., {#5 fij\ and 
$2i[j ] } and {#’5 fij] and $’2 fij]} (0 < i < 15, 0 < j < 7), the parameters of our 
differential bias attacks are chosen as N a u = (256) 2 , = 224 (= 96 + 128), 
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N ( b Zl p = 128 (= 0 + 128), N ( b f aSn = 32 and N^, n = 0 (see Table 2). Then, 
Pr c (A[Zi , Zj\ = 0) and Pr w (A[Zi , Z f -\ = 0) are estimated as 1/2 • (1 + 2 -8-192 ), 
and 1/2 • (1 + 2 -9,000 ), respectively, and q = 2 -9-42 . In our experiment with 2 40 
randomly-chosen correct and wrong pairs of keys and plaintexts, Pr c (A[Zi, Z'-\ = 
0) and Pr w (A[Zi, Z'-\ = 0) are l/2-(l+2 _8,191 ) and l/2-(l+2 _9 - 001 ), respectively, 
and q = 2 -9,42 . The number of required samples to detect a stream for a correct key 
is estimated as N sarnp i e = 2 24,84 (= 2 6 x 2 9,42x2 ). We experimentally confirmed 
that this number of samples is enough for a successful attack. With N s = (N a u ) x / 2 , 
time complexity is estimated as 2 47-84 (= (2 31 x 2 24-84 )/(2 8 )) encryptions and the 
required data is 2 42 (= 2 34 x 2 8 ) chosen plaintexts and corresponding leaked bits 
with 2 42 bits of prepared tables. 

The details of attacks for N s = (Nan) 1 / 2 are provided in Table 3, where q ^ is 
our experimental value with 2 40 randomly-chosen correct and wrong pairs of keys 
and plaintexts/ciphertexts, and T and D are time complexity and the amount 
of the required data, respectively. Our theoretical values closely approximate 
the experimental data in all cases. Since an attacker does not know the round 
number of leaked bits, he firstly guesses the round of leaked bits and then mounts 
an attack similar to the combined attack of the bitwise multiset attacks. If the 
decryption is accessible, our attacks are successful except the case where leaked 
bits come from states after 4 or 5 round only. Also, a known plaintext attack is 
possible if leaked bits from # 3 are available. 


5.2 Uncertainty in Time 


The time randomization makes the round number of leaked bits random in each 
execution, i.e., Zi and Z[ come from the fixed bit position at a random round 
of the data processing part. Additionally, we take into account the leaked bits 
from plaintexts #0 or ciphertexts #21 in the data processing part. For instance, 
assuming that leaked bits come from 33-th bits of the data processing part, 
i.e., # 04 [ 1 ], #34 [1], . . ., #194 [1] or # 214 [ 1 ], the attack parameters are given as 


r( c ) 


— 3 

— T iy bias v 


— 2 N ^ 

— iy bias n 


m 1 

’ bias n 


= 0. Then q = 2 


- 6.45 


N all = IV K aSp 

N sample = 2 19 ' 9 , T = 2 47 - 44 and D = 2 37 - 46 . 

The details of our attacks using leaked bits from the data processing part are 
provided in Table 4, where the attack parameters of chosen-plaintext differential 
bias attacks depend on the positions of leaked bits, but time and data com- 
plexities are almost same for each position. We also evaluate a chosen-plaintext 
attack when round 9 and round 1, 2, 8 and 9 rounds are black-boxed, i.e., {#19, 
$9} and {#3, #5, #17, #19, $1, $2, $8, $9} are not available, respectively. Other 
black-boxed variants are also evaluated by properly choosing attack parameters. 
Since an attacker does not know the bit position of leaked bits, he firstly guesses 
it and then mounts an attack. If the decryption is accessible, our attacks are fea- 
sible as long as leaked bits after round 1, 2, 3, 6, 7, 8 or 9 in the data processing 
part are available. A known plaintext attack is applicable if leaked bits from #3 
are obtained. However, it is a 32-bit key recovery attack, because a bit of #3 is 
affected by 32 bits of $0. 
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Table 3. Differential bias and biased state attacks for space randomization 


Chosen-plaintext (ciphertext) differential bias attack 


Round 

A all 

A biasp 

NZas p 

Nbias n 

NZas n 

q 

q(e) 

A " sample 

T 

D 

1 (8) 

256 2 

248 

224 

8 

0 

2 -11.42 

2-11.38 

228.84 

251.84 

2 42 ' 00 CP(CC) 

2 (7) 

256 2 

224 

128 

32 

0 

2 -9.42 

2 -9.42 

224.84 

247.84 

2 42 ' 00 CP(CC) 

3 (6) 

256 2 

128 

128 

128 

0 

2-16.99 

2-16.84 

239.98 

262.98 

2 42 ' 00 CP(CC) 

Known-plaintext differential bias attack 

1 

256 2 

152 

128 

8 

0 

2 -11.42 

2 -n.io 

228.84 

251.84 

2 43.00 K p 


Chosen-ciphertext biased state attack 


Round 

Ku 

Nfia. p 

NZs p 

- 

- 

q 

q(e) 

N' , 

1 y sample 

T 

D 

9 

256 

8 

0 

- 

- 

2 - 5.00 

2 - 5.00 

2 14 

226.00 

2 26 ' 00 CC 


Table 4. Differential bias and biased state attacks for time randomization 


Chosen-plaintext differential bias attack 


BB round 

A a n 

A biasp 


Kias n 

NSL. n 

q 

q(e) 

A sample 

T 

D 

None 

ll 2 

3 

2 

1 

0 

2-6.95 

2 — 6.94 

219.90 

247.44 

2 37 46 CP 

9 

10 2 

3 

2 

1 

0 

2-6.68 

2-6.68 

219.36 

247.04 

2 37 ' 32 CP 

1, 2, 8, 9 

7 2 

0 

0 

1 

0 

2-13.61 

2-13.23 

233.22 

260.41 

2 36.81 Qp 

Known-plaintext differential bias attack 

None 

ii 2 

1 

0 

0 

0 

2-6.92 

2-7.30 

219.84 

247.38 

2 38.46 K p 


Chosen-ciphertext biased state attack 


BB round 

Ku 

Kiasp 

KTaSp 

- 

- 

q 

q(e) 

Kample 

T 

D 

None 

11 

1 

0 

- 

- 

2 -3.46 

2 -3.45 

210.92 

222.92 

2 22.92 Qp; 


5.3 Uncertainty in Both Space and Time 

The space and time randomization makes the both the bit position and the 
round number of leaked bits random in each execution, i.e., and Z[ randomly 
come from any bit of any states in the data processing part {#0, #3, #5, . . 

# 19, #21} and in the key schedule {$0, ..., $10}. The parameters of the chosen- 
plaintext differential bias attacks are estimated as N a u = (256 x 1 1) 2 , N^f aSp = 
1720 (= 27 x 8 + 176 x 8 + 12 x 8), N = 1600 (= 12 x 8 + 176 x 8 + 12 x 8), 

= 200 (= 21 x 8 + 0 + 4 x 8) and #7^ = 32 (= 0 + 0 + 4 x 8). 

The details of our attacks are given in Table 5. We also provide a chosen- 
plaintext attack when round 9 and round 1, 2, 8 and 9 are black-boxed. If the 
decryption is accessible, our attacks work as long as leaked bits after round 1,2, 
3, 6, 7, 8 or 9 of the data processing part are available. Also, a known-plaintext 
attack is applicable if leaked bits from #3 are observable. 
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Table 5. Differential bias and biased state attacks for space and time randomization 


Chosen-plaintext differential bias attack 


BB round 

Nall 

NP- 

bxasp 

N?- 

bzasp 

NP- 

bzasn 

bzasn 

Q 


^sample 

T 

D 

None 

(256 • ll) 2 

1720 

1600 

200 

32 

2 - 16.02 

2 - 15.92 

2 38.04 

2 57.58 

2 45.46 cp 

9 

(256 • 10) 2 

1592 

1472 

200 

32 

2 - 15.75 

2 - 15.70 

2 37.49 

2 57.17 

2 45.32 cp 

1, 2 8, 9 

(256 • 7) 2 

896 

896 

128 

0 

2 - 22.61 

2 - 23.07 

2 51.22 

2 71 ' 41 

2 44.81 cp 

Known-plaintext differential bias attack 

None 

(256 • ll) 2 

1440 

1408 

40 

32 

2 — 17.92 

2 — 17.69 

2 41 .84 

I 2 61 38 

2 46.46 Kp 

Chosen-ciphertext biased state attack 

BB round 

N all 

N'rP 

bzasp 

n!™ 

bzasp 

- 

- 

Q 


N sample 

T 

D 

None 

(256 • 11) 

8 

0 

- 

- 

2 - 8.46 

2 - 8.44 

2 20.92 

2 32.92 

2 32 ' 92 CC 


Table 6. Differential bias and biased state attacks for leakage with noise 


BB round 

Time 

Data 

Time 

Data 

Time 

Data 

Time 

Data 


7T — 1 

7T — 2 

-10 

7T — 2 

-20 

7T — 2 _ 

30 


Chosen-plaintext differential bias attack 


None 

257.58 

2 45.46 C p 

267.58 

2 55.46 C p 

277.58 

2 65.46 C p 

287.58 

2 75.46 C p 

1, 2, 8, 9 

2 71 - 41 

2 44.81 C p 

281.41 

2 54 ' 81 CP 

291.41 

2 64.81 C p 

2101.41 

2 74 ' 81 CP 

Known-plaintext differential bias attack 

None 

261.38 

2 46.46 K p 

271.38 

2 56.46 K p 

281.38 

2 66.46 K p 

291.38 

2 76.46 K p 

Chosen-ciphertext biased state attack 

None 

232.92 

2 32 ' 92 CC 

252.92 

2 52 - 92 CC 

272.92 

2 72 ' 92 CC 

292.92 

2 92 ' 92 CC 


6 AES Under Noisy Leakage 

This section studies the effect of additional noise on top of the time and space 
randomization. The noise can be due to the limited knowledge of the platform 
by the adversary or due to the implemented countermeasures such as insertion 
of dummy operations. In the differential bias attack, this reduces the rate of 
positive/negative biased bits by adding noise bits into the space of the actually 
leaked bits. To quantify the amount of noise present in the attack, we define 
7 r as the probability that an observed bit is not a noise bit. Suppose that the 
values of the noise bits are randomly distributed, the bias of a leaked bit stream 
of the correct key with noise bits is estimated as q' = q x 7r, and the required 
number of sample bits to distinguish a stream for a correct key increases by the 
multiple of (7T 2 ) -1 to V' am?Ze = N samp ie x (7T 2 ) -1 . With N s = ( N a ii) 1/ 2 x 7r _1 , 
the time and data complexities of our known/chosen plaintext differential bias 
attacks increase by the multiple of (7r) _1 as T ^ 2 31 x (N sarnp i e x tt~ 2 )/N s x 
7 r _1 = 2 31 x (TV sarn pi e x tt~ 1 )/N s encryptions and D ~ 2 34 (2 35 ) x ( N s x 7 r -1 ) 
chosen/known-plaintexts with leaked bits. Also, the time and data complexities 
of chosen-ciphertext biased state attacks increase by the multiple of (77) _2 . The 
detailed evaluations for each values of 7 r are shown in Table 6. 
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7 Towards More Alignment: Bytewise Leakage 

Here we deal with the case where each execution leaks one byte of a byte- aligned 
state. In other words, now we let aligned bytes of internal states leak. Such leaks 
reflect the realities of a byte-oriented software implementation better. 3 * * In both 
settings - leakage with fixed and uncertain time/space - our techniques still 
apply. However, some adjustments are needed, see below. 

7.1 Fixed Time/space: Bytewise Multiset Attack 

Our bitwise multiset attacks naturally extend to bytewise multiset attacks, 
because the multiset characteristics are based on the bytewise XOR-sum prop- 
erty. The success probability for detecting wrong keys increases from (1 — 2 _1 ) to 
(1 — 2 -8 ) by using the bytewise zero-sum property. Then the time complexities 
of 2, 3, 4, 6 and 7-round attacks are estimated as {(2 32 x 2 8 x N) x 4} + (1 + 
2~ 8N x (2 32 — l)) 4 encryptions. With N = 4, it is about 2 44 . The time com- 
plexities of 1 and 8-round attacks and the 9-round attack also improve to 2 42 
(« 2 32 x 2 8 x 4 x 4/3) and 2 12 (= 2 8 x 16) encryptions, respectively. The time 
complexity of the combined attack is 2 45 (« 2 12 +2 42 +2 42 +2 44 +2 44 ) encryptions 
and the required data is 2 35 chosen plaintexts and 2 34 chosen ciphertexts. 


7.2 Uncertain Time/Space: Differential Bias Attack 

Our differential bias attacks also extend to bytewise attacks using bytewise dif- 
ferential biases of truncated differential characteristics of Fig. 4, 7 and 8. 


Chosen/Known-Plaintext Differential Bias Attack. Let a leaked byte 
from the *-th execution be Z * , and N* u , N* iaSp , iV 6 * aSn , N* andom be the number 
of bytewise pairs in the entire space, positive biased space, negative biased space 
and randomly-distributed space, respectively. The probabilities that a difference 
of a bytewise pair of randomly chosen leaked bytes is zero (A[Z*, Z f *] = 0) for 
a correct key and a wrong key are estimated as follows. 


Pr c (A[Z*,Z'*] = 0) = 1/2 8 • (AT, 


*C 

random 


/Ku)+^asJK, 


all i 


Pr w (A[Z* , Z'*\ = 0) = 1/2 8 • (. Kau do JKu ) + KZsJKu- 


Assuming that the target event E is A[Z*,Zj] = 0 , X is a distribution for 
a wrong key, and Y is a distribution for a correct key, p and q are estimated 


as p = Pr w (A[Z* , Z'*] = 0) and q 
the success probability of 1 


-N, 


bias n 


+Kias n + 255(iV b c ia -i\T£ 


J 


For 


N all~ N? iasn 

2 -32 , the required sample size is estimated as 


3 The stream cipher LEX can be regarded as a bytewise leakage model at the fixed 

space [3] but the locations of leaked bytes are known for the attacker. Thus, the 

attack against LEX [10] is not directly applicable to our unknown location model. 
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Table 7. Evaluation for byte- aligned space randomization (N s — ( N a ii ) 1 ^ 2 ) 


Chosen-plaintext (ciphertext) differential bias attack 


Round 

N all 

Ebiasp 

A J bia Sp 

N b c taSn 

N£ aSn 

q 

q(e) 

E sample 

T 

D 

1 (8) 

32 2 

31 

28 

1 

1 

2 -3Al 

2~ 3 ' 42 

219-84 

245.84 

2 3 9 00 C p( CC ) 

2(7) 

32 2 

28 

16 

4 

4 

2-°- 74 

2-0.74 

214.48 

240.48 

239 00 C p( CC ) 

3(6) 

32 2 

16 

16 

16 

0 

2 - 8.32 

2 - 8.38 

229.64 

255.64 

2 42 00 CP 

Known-plaintext differential bias attack 

1 

32 2 

19 

16 

1 

0 

2 - 2 . 7 4 

2-2.74 

218.48 

244.48 

2 40 '00 KP 


Chosen-ciphertext biased state attack 


Round 

Ku 

Kia Sp 

NZ Sp 

- 

- 

q 

q(e) 

E sample 

T 

D 

9 

32 

1 

0 

- 

- 

2 2.99 

2 2.99 

2ii.oo 

223.00 

2 23 00 CC 


Table 8. Evaluation for byte- aligned time randomization (N s — ( N a u ) 1 ^ 2 ) 


Chosen-plaintext differential bias attack 


BB round 

Nall 

Nbias p 

N? iasp 

N C Ua Sn 

N%a, n 

q 

? (e) 

N sample 

T 

D 

None 

li 2 

3 

2 

1 

0 

2 -1,31 

2 -1 ' 31 

215.62 

243.16 

2 37.46 C p 

9 

10 2 

3 

2 

1 

0 

2~ 1 ' 26 

2~ 1 ' 26 

215.52 

243.19 

2 37.32 Qp 

1, 2, 8, 9 

7 2 

0 

0 

1 

0 

2-5.61 

2-5.50 

224.22 

252.41 

2 36.81 Qp 

Known-plaintext differential bias attack 

None 

(n) 2 

1 

0 

0 

0 

2 1.08 

2 1.08 

213.00 

240.54 

2 38.46 K p 


Chosen-ciphertext biased state attack 


BB round 

Ku 

K°ias p 

K7as p 

- 

- 

q 

? (e) 

N sample 

T 

D 

None 

11 

1 

0 

- 

- 

2 4.50 

2 4.50 

211.00 

223.00 

2 23 00 CP 


N sample ~ 32-256-g 2 = 2 13 -g 2 . Time complexity is estimated as 2 31 xN sarnp i e /N s 
encryptions and the required data is 2 34 (2 35 ) x N s chosen/known plaintexts with 
leaked bits. 

Chosen-Ciphertext Biased-State Attack. Assuming that the target event 
E is Zi = 0, p and q are estimated as p = 1/2 8 and q = (255 x N£ ias ^) /N a u. The 
number of required samples is estimated as N sarnp i e « 8 • 2 8 • (g) -2 . We repeat 
the procedure for all 16 byte of $10. Therefore, time complexity is estimated as 
2 12 x N sarnp i e (= 16 x 2 8 x N sarnp i e ) encryptions and the number of required 
data is 2 12 x N samp i e (= 16 x 2 8 x N sarnp i e ) chosen ciphertexts. 


Security Under Time and Space Randomization and with Leakage 
Noise. The results of security evaluations under time and space randomization 




How Secure is AES Under Leakage 383 


Table 9. Evaluation for byte-aligned space and time randomization ( N s = ( N a ii ) 1 ^ 2 ) 


Chosen-plaintext differential bias attack 


BB round 

Nall 

N?- a 

btasp 

N hon, 

btasp 

NF- 

btas n 

N hon, 

btas n 

q 


N sample 

T 

D 

None 

(32 • ll) 2 

215 

200 

25 

4 

2 — 5.52 

2 -5.53 

2 24.04 

2 46.58 

2 42.45 cp 

9 

(32 • 10) 2 

199 

184 

25 

4 

2 -5.29 

2-5.29 

2 23.58 

2 46.26 

2 42.32 cp 

1, 2, 8, 9 

(32 • 8) 2 

112 

112 

16 

0 

2 — 12.52 

2-12.58 

2 38.04 

2 61.23 

2 41.80 cp 

Known-plaintext differential bias attack 

None 

(32 • ll) 2 

179 

176 

5 

4 

2 — 7. 79 

2 — 7.74 

2 28.58 

2 52 - 12 

2 43.45 Kp 

Chosen-ciphertext biased state attack 

BB round 

Kn 

N!F 

btasp 

N h7n« 

bzasp 

- 

- 

q 


N sample 

T 

D 

None 

(32 • 11) 

1 

0 

- 

- 

2-0.46 

2 -0.46 

2 11.92 

2 23.92 

2 23 ' 92 CC 


Table 10. Evaluation for byte-aligned leakage with noise (N s = (N a u ^XTT- 1 ) 


BB round 

Time 

Data 

Time 

Data 

Time 

Data 

Time 

Data 


7T — 1 

7T — 2 

-10 

7T — 2 

-20 

7T — 2 

-30 


Chosen-plaintext differential bias attack 


None 

246.58 

2 42.45 C p 

256.58 

2 52.45 C p 

266.58 

2 62.45 C p 

276.58 

2 72.45 C p 

1, 2, 8, 9 

261.23 

2 41.80 C p 

271.23 

2 51.80 C p 

281.23 

2 61.80 C p 

291-23 

2 71,80 CP 

Known-plaintext differential bias attack 

None 

250.28 

2 43.45 K p 

260.28 

2 53. 45 K p 

270.28 

2 63 45 K p 

280.28 

2 73.45 K p 

Chosen-ciphertext biased state attack 

None 

223.92 

2 2S.92 cc 

243.92 

2 43.92 cc 

263.92 

2 63,92 CC 

283.92 

2 83.92 cc 


with noisy leakage are provided in Tables 7, 8, 9 and 10. 4 * * * In all cases, time 
complexity and data requirements are improved compared to the bit-aligned 
attacks. 

8 Some Extensions 

8.1 AES-192 and 256 

Bitwise multiset attacks and differential bias attacks on AES- 128 are directly 
applicable to AES-192 and AES-256 in both fixed and random settings. In the 
backward direction, 6- to 9- round attacks on AES- 128 are corresponded to 8- to 
11-round ones on AES- 192 and 10- to 13- round ones on AES-256, respectively. 

8.2 Multiple-Bit Leakage 

Here we consider the case where M bits of the bit-aligned state information leak 
in each execution for a small M. Let . . . , Z l M be M leaked bits of the i-th 

execution. 

4 If q is not small, Lemmata 1 and 2 are not applicable [16]. In this case we estimate 

Ngampie = 2 11 and 2 13 for known-plaintext differential bias attacks and chosen- 

ciphertext biased state attacks, respectively. We confirmed experimentally that these 

numbers of samples were enough for successful attacks. 
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Bitwise Multiset Attack: Assume that Zq, Z \, . . . , Z l M _ 1 come from different 
but fixed locations of the state. If the XOR sum of 2 8 multiset of each location 
is zero, the XOR-sum of all set of 2 8 x M bits is also zero. Thus, bitwise multiset 
attacks are feasible as long as leaked bits come from space where each XOR sum 
is zero only in a correct key. Time and date complexities are almost the same. 

Differential Bias Attack: Assume that Zf , . . . , Z l M come from randomly- 
chosen different locations of the state. Since the attacker is able to obtain M 
bits in each execution, the required data reduces by a factor of M. 


8.3 Other Granularities 

So far, we have assumed that a leak can only occur after a full round. However, 
in other granularities such as leaks after SubBytes or MixColumns, our bitwise 
multiset attacks and differential bias attack still work. 

Bitwise Multiset Attack: According to Proposition 1, any bit of the states 
between 7^3 and 7^10 has the zero-sum property if the key is correctly guessed. 
Using the difference of zero-sum properties between correct and wrong key cases, 
bitwise multiset attacks are applicable to other states in the same manner. 

Differential Bias Attack: By properly choosing attack parameters, our differ- 
ential bias attacks are also made feasible. For instance, if bits of the states after 
SubBytes are additionally leaked, the parameters of chosen-plaintext differential 
attacks on AES- 128 with the space and time randomization are estimated as 
N a ii = (256 x 11 + 128 x 10) 2 , N^ Sp = 2032 (= 216 + 216 + 1408 + 96 + 96), 
N^ a \ p = 1792 (= 96+96+1408+96+96), N^ Sn = 400 (= 168+168+0+32+32), 

^bias n = 64 (= 0 + 0 + 32 + 32), and q = 2 -16 ' 10 . The number of required sam- 
pies is estimated as N sarnp i e = 2 38 - 02 (= 2 6 x 2 16 .oi- 2 ^ With Ng = (N a u)V 2 , time 
complexity is 2 57 ' 02 (= (2 31 x 2 38 ' 02 )/(256 x 11 + 128 x 10)) encryptions and the 
required data is 2 46 (= 2 34 x (256 x 11 + 128 x 10)) chosen plaintexts. 
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Abstract. Iterated Even-Mansour scheme (IEM) is a generalization of 
the basic 1-round proposal (ASIACRYPT ’91). The scheme can use one 
key, two keys, or completely independent keys. 

Most of the published security proofs for IEM against relate-key and 
chosen-key attacks focus on the case where all the round-keys are derived 
from a single master key. Whereas results beyond this barrier are relevant 
to the cryptographic problem whether a secure blockcipher with key-size 
twice the block-size can be built by mixing two relatively independent 
keys into IEM and iterating sufficiently many rounds, and this strategy 
actually has been used in designing blockciphers for a long-time. 

This work makes the first step towards breaking this barrier and con- 
siders IEM with Interleaved Double independent round- keys: 

IDEM r ((/ci, fe), m) = fc © (P r (. .ii© P 2(&2 © Pi(ki © rrij ) . . .)), 

where i — 2 when r is odd, and i — 1 when r is even. As results, this 
work proves that 15 rounds can achieve (full) indifferentiability from an 
ideal cipher with 0(q 8 / 2 n ) security bound. This work also proves that 7 
rounds is sufficient and necessary to achieve sequential-indifferentiability 
(a notion introduced at TCC 2012) with 0(q 6 / 2 n ) security bound, so 
that IDEM7 is already correlation intractable and secure against any 
attack that exploits evasive relations between its input-output pairs. 


Keywords: Blockcipher • Ideal cipher • Indifferentiability • Key- 
alternating cipher • Even-mansour cipher • Correlation intractability 


1 Introduction 

Blockciphers are arguably the most important primitives in cryptography. 
A blockcipher BC[ft, n] : {0,1}^ x {0, l} n — ► {0, l} n maps a k - bit key K and 


D. Lin — A full version is available [GL15b]. 

© International Association for Cryptologic Research 2015 

T. Iwata and J.H. Cheon (Eds.): ASIACRYPT 2015, Part II, LNCS 9453, pp. 389-410, 2015. 
DOI: 10.1007/978-3-662-48800-3-16 


390 


C. Guo and D. Lin 


an n-bit input x to an n-bit output y. For each key K , the map BC[n, n\(K, •) is 
a permutation, and is efficiently invertible. 

Most of the existing blockcipher designs can be roughly split into two fam- 
ilies, namely Feistel ciphers and substitution-permutation networks. The latter 
are known as the structure of AES, and can be generalized as key- alternating 
ciphers [DR02 ]/ iterated Even-Mansour ciphers (IEM for short). An r-round IEM 
cipher IEM r consists of r fixed n-bit permutations Pi separated by key addition 


IEM r (K,m) = k r 0 P r (. .i 2 © ^2(^1 ® Pi{ko ® m)) . . .). 


The single round Even-Mansour (the case r = 1) was developed in 1991 [EM93] 
in an attempt to turn a single permutation into a family of permutations (block- 
cipher). IEMi has been proved pseudorandom when the underlying permutation 
is random and public while the keys are secret. Since then, a soar of studies on 
IEM has been witnessed (especially in the recent half decade), for instance, on 
minimization [DKS13, CLL+14], on pseudorandomness [BKL+12,Stel2,LPS12, 
CS14], on related-key (RK) security [FP15,CS15], and on attacks (notable exam- 
ples include [DKS13,DDKS15,DDKS14]). The pseudorandomness results showed 
that IEM is provably secure in traditional single secret key settings. 

Indifferentiability of IEM. The studies on indifferentiability and sequential- 
indifferentiability ( seq-indifferentiability ) of IEM are mainly motivated by fur- 
ther validating the SPN-based blockcipher design methodology by proving IEM 
secure against known-key and chosen-key (CK) attacks , in which the adversary 
knows and chooses keys and tries to exhibit non-randomness. Roughly speak- 
ing, indifferentiability of IEM means that IEM can be as secure as an ideal 
cipher [MRH04] , whereas seq-indifferentiability of IEM implies that IEM is cor- 
relation intractable [CGH04], and there is no relation between the inputs and out- 
puts of IEM that can be exploited by an attack (even a chosen- key one) [MPS12]. 
Here the ideal cipher IC[n, n\ : {0,1}^ x {0, l} n — ► {0, l} n is taken randomly 
from the set of (2 n !) 2K possible choices of BC [n,n]. In this work, IC[2n,n] will 
be referred by E. 

As to (seq-) indifferentiability, we have been aware of four works: [ABD+13], 
[LS13], [CS15], and [Stel5]. [ABD+13] showed that IEM5 is indifferent iable from 
IC [n,n], if the round-key is derived from a preimage- aware key derivation func- 
tion (KDF). On the other hand, [LS13] and [CS15] concentrated on single-key 
EM (SEM) in which the user-provided n-bit master key is directly used at 
each round: [LS13] proved that SEM12 (12-round SEM; similarly for SEM 4 and 
SEMg) is indifferent iable, while [CS15] proved that SEM4 is seq-indifferentiable. 
In [Stel5], Steinberger proved the indifferentiability of SEMg. Results on SEM 
are closer to concrete designs, since they can be easily generalized to the case 
where each round-key is derived by an efficiently invertible permutation. 

Problem: Even-Mansour with Two Keys. Existing works on provable secu- 
rity of IEM in RK and CK settings almost all focus on the SEM context: [LS13] 
(ASIACRYPT 2013), [FP15] (FSE 2015), [CS15] (EUROCRYPT 2015) (except 
for those considered random oracle modeled KDF, e.g. [ABD+13]). This work 
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makes the first step towards breaking this barrier and considers the following 
problem: can we obtain an ideal cipher by mixing two independent keys into 
IEM and iterating enough rounds? (a problem left open by Lampe and Seurin 
(LS) [LS13] ) 1 . This problem is far from being trivial because all the works 
on SEM (in RK and CK settings) crucially rely on the correlation between 
all round-keys, so that they cannot be directly generalized to double-key case. 
Also, the independence between round-keys may bring in weakness - the most 
extreme case is IEM with completely independent round-keys, which is vulner- 
able to trivial related- key attacks. This problem is also practical since the idea 
is really used in existing designs such as AES-256 [DR02], Serpent [ABK98], 
and LED- 128 [GPPR11] - note that they (certainly) mix the keys into the state 
by lightweight and efficient operations and iterate, rather than use some very 
complex hash function to seal the 2 n key bits first. The intuition is that by 
iterating enough rounds, such designs will be “secure”; but the fact that the dif- 
fusion of the 2 n key bits is relatively slow brings in doubts (e.g. doubts on AES- 
256 [KHP07,BDK+10]). The fact that among the three AES variants, AES-256 
was the first that is theoretically broken [BK09] seems to support such doubts, 
and this attack raises a problem whether there exists a BC[2n,n] design behav- 
ing like IC[2n, n]; 2 due to this, it is necessary to either validate (using a security 
proof) or negate (using a generic attack) this intuitive methodology. 

To dig out a solution, note that using one key in the first n/2 rounds while 
using the other in the last n/2 rounds is trivially insecure [LS13]. Instead, a 
(seemingly) more promising approach to mixing two keys into IEM is the idea 
behind LED-128 [GPPR11], that is, interleaving the xoring of them: we name it 
interleaved double-key Even- Mans our cipher (IDEM for short; see Fig. 1 for an 
illustration). More formally, the r-round variant is written as follows: 


IDEM r ((£q, k 2 ),m) = 0 P r (. . . k 2 0 ^3(^1 0 ^2(^2 0 Pi{h 0 m)))), 


where t = 2 when r is odd, and t = 1 when r is even. LS viewed IDEM as 
a promising solution to the problem mentioned before, and gave an extremely 
preliminary analysis, which led to the conjecture that 15 rounds is sufficient to 
achieve indifferentiability; but no concrete proof exists. Moreover, the provable 
security of IDEM with shorter rounds has not been considered yet. 

Contributions. We give the first indifferentiability proof for 15-round IDEM. 
This is the first main result of this work. Interestingly, this matches LS’s conjec- 
ture, but the proof is obtained by an approach quite different from they expected. 

To obtain security guarantees for shorter round cases, we prove that IDEM7 is 
seq-indifferentiable from IC[2n, n\; therefore, IDEM7 is also correlation 
intractable in the random permutation model [MPS12], and resists all attacks 

1 A trivial solution to building IC[2n, n] by IEM is hash-than-encrypt, which has been 
included in [ABD+13]. It was also discussed in [CDMS10]. But this solution imposes 
strong burden on the key derivation and is far from practical designs. 

2 Please see [CDMS10], page 275: as of 2009 it is unclear if we have a candidate 
block-cipher with key-size larger than block-size that behaves like an ideal cipher. 
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that exploit evasive relation between its inputs and outputs. We also find a 
sequential distinguisher against IDEM6 (which is actually an easy extension of 
LS’s attack against SEM3 [LS13] ) , so that 7 rounds is also necessary. All the 
results are summarized by the following informal theorem. 

Theorem. For the construction IDEM based on completely independent 
random permutations , 6 rounds is not (seq-)indifferentiable; 7 rounds is seq- 
indifferentiable from IC[2n, n] with 0(q 6 / 2 n ) security bound , and is also corre- 
lation intractable in the random permutation model; 1 5 rounds is indifferentiable 
from IC[2n, n] with 0(q 8 / 2 n ) security bound. 

Due to the independence between the two n-bit round keys, at current time, we 
are not sure whether the results can be generalized to IEM with “very general” 
key schedules; however, for the first time, these results indeed validate the (seem- 
ingly long standing) design principle to some extent in the open-key model, i.e. 
a secure blockcipher BC[2n,n] can be built from key-alternating ciphers with- 
out using very complex KDFs, or even without any KDF. Especially, they show 
that the intuition behind the key schedule of LED- 128 is sound. However, they 
certainly cannot provide direct security guarantee for LED- 128 - in fact, as 
theoretical results, they do not guarantee the security of ANY concrete blockci- 
pher. As already mentioned, whether there exist some designs that “behave like” 
IC[2n, n\ have to be supported by more (cryptanalysis) works. 

Techniques. To prove indifferent iability and seq-indifferentiability, one first 
builds a simulator to mimic the behaviors of all the underlying permutations. 
Taking IDEM15 as an example, consider a sequence of pairs of input and out- 
put (10 for short) (aq, 2/1), . . . , (#15, 2/15) (called a computation path/chain ) of 
the 15 permutations simulated by the simulator, which satisfies yi ® aq + 1 = Aq 
when i is odd, and yi ® Xi+% = Aq when i is even. The simulator should ensure 
that each such chain simulated by it matches the ideal cipher E, i.e. E ((Aq, Aq), 
x\ ® Aq) = 2/15 0 Aq. The basic idea to reach this goal is Coron et al.’s sim- 
ulation via chain completion technique [CHK+14], which has achieved success 
in (weaker) indifferentiability proofs for a variety of idealized blockciphers. It 
requires the simulator S to detect partial computation chains formed by the 
queries of the distinguisher, and completes the chains in advance by querying 
the ideal cipher E, such that S is ready for answering queries in the future. To 
simulate answers that are consistent with E, S has to use the answer from E to 
define some simulated answers; this action is called adaptation. 

Detect Chains. To detect the so-called partial chains, note that the construction 
IDEM has the following property: given 4 values of 3 permutations ?/i, aq+i, Vi+ 1, 
and Xi+ 2 (namely, an output of Pi , a pair of 10 of P^+i, and an input to Pi+ 2 ), 
the two associated keys can be derived as k = yi 0 aq + 1 and k' = i/i+i 0 aq+2, 
and it is possible to move forward and backward along the path. By this, at 
some place, using three rounds for chain detection is necessary - this idea has 
already appeared in [LS13]. 
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Overall Strategy of the Simulators. As to the overall strategy, the simulator 
used to prove seq- indifferentiability of IDEM7 is quite close to those for 6-round 
Feistel [MPS12] and SEM4 [CS15] : the simulator detects partial chains at the 
three middle round of IDEM7, completes them forward or backward, and finally 
adapts them at the first or last round - depending on concrete contexts. 

On the other hand, the simulator used to prove the indifferentiability of 
IDEM15 is motivated by Steinberger’s illustration of indifferentiability proof for 
SEMg [Stel5]. The overall strategy requires detecting chains both at the two sides 
and at the middle - which is similar to several previous works (e.g. [CHK+14]). 
The core idea in this part is a so-called “pureness” property which is different 
from [CHK+14]: the simulator may fall into a recursive chain completion process; 
however, during each such recursive completion process, all the partial chains are 
to be adapted at the same round ; as a consequence, when a partial chain is to be 
completed, its extending is necessarily due to simulator defining new simulated 
answers to random ones rather than the adaptation of some other chains, so 
that the “ endpoints ” of this chain are always random. Whereas in the context 
of IDEM, to uniquely specify a chain requires at least 3 values of 3 consecutive 
permutations, so that the adversary has more freedom to choose values and make 
different chains collide. With this in mind, we arrange two rounds to surround 
each adaptation zone to ensure different chains diverge in the adaptation zone; 
following an old convention [CHK+14], we call them buffer rounds. 3 * For a more 
detailed overview of the simulator, we refer to Sect. 3.1. 

In the indifferentiability proof for IDEM15, we used an active- chain- oriented 
bad events define strategy , which is motivated by the analysis of IDEM7: we 
directly define some bad events to be with respect to the chains that are active 
during the completion process. This helps us achieving the 0(q 8 / 2 n ) indifferen- 
tiability security bound in spite of the complex character of IDEM. Albeit loose, 
this bound has been quite well-looking compared to similar (full) indifferentia- 
bility proofs for idealized blockciphers (the best non- flawed one(s) among them 
reached the level of 0(q 10 / 2 n ) [ABD+13]). 

Summary: What are Inherited and What are Novel? Technically speaking, we 
inherit the simulation via chain completion technique, the randomness mapping 
argument, and the basic idea for simulator termination argument from [CHK+14] ; 
we also inherit (and adapt) the overall frameworks of Steinberger (which dates 
back to Seurin [Seu09]) and Cogliati et al. [CS15] (which dates back to Mandal 
et al. [MPS12]). Our novelties mainly lie in the proof for IDEM15: first, we use a 
bad event to establish a slightly tighter bound on the size of the history (0(q 2 / 2 n )) 
and the simulator’s complexity; second, we define the bad events to be so-called 
active-chain-oriented, so that the probability can be very low (0(q 6 / 2 n )). They 
two together enable to establish the 0(q 8 / 2 n ) security bound. 


3 But our “buffer” rounds deviate from those in [CHK+14], in the sense that the 

values in them can be defined when the simulator is completing other chains. 
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Organization. Section 2 presents preliminaries. Section 3 contains the first main 
result - the indifferentiability of IDEM 15 , and the proof sketch. Section 4 con- 
tains the second main result - the seq-indifferentiability of IDEM 7 . Section 5 
concludes. Due to page constraints, the full proofs of the two main results have 
to be deferred to the full version of this paper [GL15b]. 


2 Preliminaries and Notations 


This work focuses on BC[2n, n], say, blockciphers with n-bit blocks and 2 n-bit 
keys. Throughout the remaining, the n-bit round-keys are denoted by lower-case 
letters, i.e. k\ and & 2 , while the 2 n-bit master key is interchangeably denoted by 
the capital letter K or the concatenation (&i,& 2 ) (as the reader has seen). 

An n-bit random permutation is a permutation that is uniformly selected 
from all (2 n )! possible choices. In this work, the notation P = (Pi,...,Pj) 
is used to denote a tuple of random permutations ( j = 15 in the context of 
IDEM15, and j = 7 in the context of IDEM7), and we let P provide a unified 
interface, i.e. P.P(i,£, z) := {1 ,...,j} x {+, — } x {0, l} n — > {0, l} n , i indicates 
the index, S G {+, — } indicates direct query or inverse query, and z G {0, l} n 
is the queries value). On the other hand, the interface of the ideal cipher E is 
E.E(£, AT, z) := {+, -} x {0, l} 2n x {0, l} n {0, l} n . 


Indifferentiability. The indifferentiability framework [MRH04] addresses the 
idealized construction in settings where the underlying parameters are exposed 
to the adversary. For concreteness, consider IDEM P 5 : a distinguisher D IDEMl5,p 
with oracle access to the cipher and the underlying primitives is trying to dis- 
tinguish IDEM P 5 from E. Then the formal definition is as follows. 

Definition 1 (Indifferentiability). The idealized blockcipher IDEM P 5 with 
oracle access to ideal primitives P is said to be statistically and strongly (g, <7, £, e)- 
indifferentiable from an ideal cipher E if there exists a simulator S E s.t. S makes 
at most a queries to E, runs in time at most t , and for any distinguisher D which 
issues at most q queries, it holds 


p r [ D IDEMf 5 ,P = _ p r [D E ’ sE 


1 ] 


< 5 


Such a result means that IDEM P can safely replace E in most “natural” settings - 
although this belief does not necessarily hold when the resource of the adver- 
sary is limited [RSS11,DGHM13]. Since introduced, indifferentiability framework 
has been applied to various constructions, including variants of Merkle-Damgard, 
Feistel [CHK+14], Sponge [BDPVA08] , and IEM [ABD+13,LS13]. 

To formally define seq-indifferentiability, we first specify a restricted distin- 
guisher class, namely the sequential distinguisher (seq- distinguisher) [MPS 12 ]. 
Consider IDEM P and p IDEM 7 > p . D is sequential if it issues queries in a specific 
order: (1) queries the underlying primitives P as it wishes; (2) queries IDEM P 
as it wishes; (3) outputs, and cannot query P again. This order is illustrated 
in the italic numbers in Fig. 3. We then define the notion total oracle query 
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cost of D , which equals the total number of queries received by P (from D or 
IDEM P ) when D interacts with (IDEM p ,P) [MPS12]. Then, the definition of 
seq-indifferentiability can be obtained by tweaking the definition of (full) indif- 
ferentiability by restricting the distinguisher to the range of sequential ones, and 
replacing the query cost of the distinguisher by the total oracle query cost. 

Definition 2 (Seq-indifferentiability). The idealized blockcipher IDEM P 
with oracle access to ideal primitives P is said to be statistically and strongly 
(. q,a,t,£)-seq-indifferentiable from an ideal cipher E if there exists a simulator 
<S E s.t. S makes at most a queries to E , runs in time at most t, and for any 
sequential distinguisher D of total oracle query cost at most q, it holds 


Pr[£>iDEM?,P = i] _ p r [D^ sE 


1 ] 


< £ 


Sequential-indifferentiability implies correlation intractability [MPS12,CS15]. 


3 Indifferentiability for 15-Round IDEM 

We prove the first main theorem of this paper in this section, which is: 

Theorem 1 . The 15-round Even-Mansour cipher IDEM 15 from fifteen indepen- 
dent random permutations P = (Pi, . . . , P 15 ) and two n-bit keys (Aq, Aq) alter- 
natively xored is strongly and statistically (g, <7, t, £)-indifferentiable from an ideal 
cipher IC[2n, n\, where a = 2 10 • q 8 , t = 0(q 8 ), and £ < 2 2n q + 2 2n q = 0(|W). 

As usual, we first present the simulator, then sketch the proof. 


3.1 The Simulator 

To build the simulator, we borrow a variant of the explicit randomness tech- 
nique [CHK+14] from [CS15], that is, letting the simulator S query P as explicit 
randomness. We denote by S E,P the simulator for IDEM 15 which takes P as ran- 
domness source and interacts with E. S E,P provides an interface S.P(i, 5, z) (i G 
{1 , . . . , 15}) which is exactly the same as P. As argued [ABD+13, CS15], using 
such explicit randomness is actually equivalent to lazily sampling in advance 
before the experiment. 

We now give a high-level overview of the simulator S E,P (depicted in Fig. 1 
(left)). S maintains a history for each simulated permutation under the form 
of fifteen sets Pi, ... , P15. Each of the sets has entries in the form of (x, y) for 
x, y G {0, l} n . S will ensure that for any z G {0, l} n and i G {1, . . . , 15}, there 
is at most one z' G {0, l} n such that (z,z ; ) G Pi , and vice versa; once such 
consistency cannot be kept, S aborts (will be discussed later). By this, the sets 
{p} = {Pi,... , P15} are expected to define fifteen partial permutations , and we 
denote by P+ (P^ - , resp.) the (time-dependent) set of all n-bit values x (y, resp.) 
satisfying that 3z G {0, l} n s.t. (x,z) G Pi {(z,y) G Pi, resp.); denote by P+ (x) 
(P j f(y), resp.) the corresponding value of z (as mentioned the uniqueness of z 
is ensured by S). 
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Queries that have already appeared in the history will be instantly answered 
with the contents in {P}. Upon a new query S E,p .P(i, S, z), S E,P queries P to 
draw z' = P.P (i, 6 ,z) as the answer and calls a procedure ForceVal (z,z', 6 ,i) 
to add z and z r to Pi - inside this procedure, if z f is found already in P/, 
S E P aborts due to the broken consistency (as mentioned). Then, if (i,S) G 
{( 2 , +), ( 6 , — ), ( 10 , +), ( 14 , — )} it satisfies the chain detection conditions , so that 
S E P enqueues and completes chains formed by previous queries to ensure that 
it is ready to simulate answers consistent with those of E in the future. 

The cases (i, 6 ) equals ( 2 ,+) and ( 14 ,—) are similar: taking the former 
P( 2 , +,#2) as an example, S E,P considers all tuples (^i, 2/1, £14, 2/14, 2/15) 

such that (xj,yj) G Pj for j G { 1 , 14 , 15 }, recovers two keys := 2/1 ® x 2 and 
ki := 2/14 ® X15, computes 2/0 := x\ ® k\ and x\§ := 2/15 ® &2, checks whether 
E.E(+, (fci, £2), Vo) = x i 6 via an inner procedure S. Check and enqueues a 
5 -tuple (2/0, Aq, &2, 0 , 4 ) into a queue ChainQueue when this is the case. In this 
tuple, the 4 -th value 0 informs S that the first value of the tuple is 2/0? and the 
last value 4 describes that when completing the chain characterized by the tuple 
(1/0, Aq, &2, 0 ), S should add the adapted pair to P4 to ensure consistency with E. 
The action towards answering new query P( 14 , — , 2/14) is symmetric: S considers 
all tuples (aq, 2/1 , X2 , 2/2 5 ^15, 2/15) such that ( Xj,yj ) G Pj for j G { 1 , 2 , 15 }, recov- 
ers the two keys, calls S. Check and enqueues (2/0, Aq, Aq, 0 , 12 ) into ChainQueue 
when Check returns true. The chain represented by this 5 -tuple will be adapted 
at P12, which is different from the case (i, ( 5 ) = ( 2 , +). 

The other two cases P( 6 , — , 2/6) and P( 10 , +,aqo) are similar by symmetry: 
in each case, S considers all tuples (aq, 2/7, aq, 2/85 #9, 2/9) such that (xj,yj) G Pj 
for j G { 7 , 8 , 9 }, computes Aq := 2/8 ® %9 and Aq •= h7 ® ^ 8 , checks whether 
x 7 (&ki = 2/6 A2/9©&2 G P+ (in case P( 6 , — , 2/6 ) ) or £7® £q G P 6 - A2/9®fe = ^10 
(in case P( 10 , +, xio)), and enqueues (2/7, Aq, fe, 7 , Z) into ChainQueue when this 
is the case, where l = 4 in case P( 6 , — , 2/6) and / = 12 in case P( 10 , +, xio). 

After enqueuing, S starts an execution of RecursiveCompletion, during 
which it continues taking the tuples out of ChainQueue and completing the 
associated partial chains till ChainQueue is empty again. More clearly, for each 
chain C dequeued from the queue, S evaluates in the IDEM15 computation 
path both forward and backward and queries E once to “wrap” around, until 
obtaining xi (when moving forward) or yi (when moving backward). S then calls 
Force Val ( x/, 2/z, _L, l) to add (x^yi) to Pi as a newly defined pair of 10 , so that 
the entire computation path is consistent with the answers of E. Inside this call 
to ForceVal, if xi G P+ or yi G P[~ before they are to be added, S aborts (also 
as mentioned). 

During the completion of a chain, S adds new entries to Pi which are neces- 
sary for its evaluation. Such new values also trigger new chains to be enqueued 
when they satisfy the chain detection conditions mentioned before. For this, note 
that S continues dequeuing and completing chains till ChainQueue is empty 
again. To avoid re-completing the same chain, S maintains a set CompSet to 
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keep a record of what it has completed, and a chain C dequeued from the queue 
will be completed only if C ^ CompSet. After all the works above are finished, 
S answers the original query with Pf(z). 

Note that throughout the process, the entries in S.{P} are never overwritten; 
once S finds itself unable to maintain consistency any more, S just aborts. 

The pseudocode of S E,P and a modified simulator S E ,p (please see Sect. 3.2) 
is presented as follows. When a line has a boxed variant next to it, S E,P uses 
the original code, whereas S E ,p uses the boxed one. 


3.2 The Key Points of the Proof 

As in previous works, for any fixed, deterministic, 4 and computationally 
unbounded distinguisher D, we first show that the complexity of S E,P is poly- 
nomial except with negligible probability, then show that the simulated system 
Zi(E, S E,P ) and the real system i7 3 (IDEMf 5 , P) are indistinguishable. 

Intermediate System. The proof uses an intermediate system U 2 (P E , S E ,p ) 
which consists of two modified constructions P E and S EE,r . P E can be seen as 
an ideal cipher E enhanced with a history maintaining mechanism and a Check 
procedure. More clearly, P E offers the same interface as E, relays the answers of 
E except that it uses a set ES to keep the history of these queries. The entries in 
ES are of the form (x,y, (Aq, A^)). F E provides an additional interface Check 
to S'; upon a call to Check (x,y,K), E E checks whether (x,y,K) £ ES and 
answers accordingly. On the other hand, the modified simulator S eE,p shares 
the same main strategy with S E,P except that it queries p E - particularly, it 
calls P E . Check whenever necessary. The code of S is presented along with S in 
Sect. 3.1. The three systems are depicted in Fig. 2. 

Since all the entries of ES actually come from (an ideal cipher) E, ES always 
defines a partial cipher, and we use a notation system similar to that for {P}. 
More clearly, we denote by PS + the set of tuples (K,x) s.t. 3y : (x,y,K) £ 
ES, and denote by ES + (K, x) the corresponding y. Similarly for ES~ and 
ES~ (K,y). Finally, denote by \E.ES\ the size of E.ES. 

Complexity of S in ^ 2 , and Transition from to ^ 2 . The starting 
point is the same as [CHK+14]: the number of “external” chains (2/0 , Aq , &2 , 0) 
completed by S is bounded by the number of queries of D to E, which is at most 
q ; by this, for i £ {6, 7, 8, 9, 10}, Pi consists of entries due to D’s queries and S 
completing chains ( 2/0 , Aq, 0), so that \Pi\ < 2 q. 

Then the argument deviates from [CHK+14]: by construction, S enqueues 
a “middle” chain (2/7, Aq, Aq, 7, l) only if there are 5 entries (. aq,i/i ) £ Pi for 
i = 6, 7, 8, 9, 10 s.t. ?/6 = 2+02/8 03+ and 2/7® 3+ ©2/9 = aqo- Consider a bad event 
BadLockMid, which happens in D^ 2 if any call to InnerP creates a new pair of 


4 This is wlog since the advantage of any probabilistic distinguisher cannot exceed the 
advantage of the corresponding optimal deterministic version. 
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1: Simulator S E,P : Simulator S eE,p : 

2: Variables 

3: Sets {P} = {Pi, . . . , P15} and CompSet , initially empty 
4: Queue ChainQueue , initially empty 
5: public procedure P (i,S,z) 

6: z InnerP(z, 5, z) // Chains are enqueued in this step 
7: RecursiveCompletion() 

8: return z 

9: // The recursive completion process is extracted as an individual procedure. 
10: private procedure RecursiveCompletion() 

11: while ChainQueue / 0 do 

12: (yj, ki, k 2 , j, l) := ChainQueue. Dequeue() 

13: if (i/j, /q, k 2 , j) £ CompSet then 

14: Complete^-, ki,k 2 ,j,l) 

15: // The “inner” permutation interface used by S itself. 

16: private procedure InnerP(z, 5, z) 

17: if z £ Pf then 
18: z' :=P.P(i, 6, z) 

19: ForCeVal^, z\ 5, i) 

20: EnqueueChains (i,5,z) 

21: return Pf (z) 

22: // Procedure that enqueues chains. 

23: private procedure EnqueueChains(z, 5, z) 

24: if (i,S) = (2, +) then 

25: forall ((aq, i/i), aq, yi4, (aqs, 2/15)) G Pi x {z} x P^ x P i5 do 

26: k 2 := yi © x 2 

27: ki := yi 4 0 aqg 

28: i/o := aq 0 hi 

29: Xi6 := i/15 © k 2 

30: flag := CHECK(i/ 0 , aq 6 , (Aq, fe)) 

31: if flag = true then 

32: ChainQueue. Enqueue^ , Aq , Aq , 

33: else if (z, £) = (14,—) then 
34: forall ((aq, 2/1), aq, z/14, (aqs, 2/15)) G Pi x P 2 + x {z} x Pi 5 do 

35: k 2 := i/i © aq 

36: Aq := i/14 © aq 5 

37: 2/0 := aq © Aq 

38: aq 6 := i/15 © Aq 

39: flag := CHECK(i/ 0 , aq 6 , (Aq, Aq)) flag := F E .CHECK(i/ 0 , aq 6 , (Aq, Aq)) 

40: if flag — true then 

41 : ChaznQzzezze.ENQUEUE(i/ 0 , Aq , k 2 , 0, 12) 

42: else if (z, 5) = (6, — ) V (z, (5) fe (10, +) then 
43: forall ((aq, i/7), (aq, i/s), (aq, z/ 9 )) G P 7 x P 8 x P 9 do 

44: Aq := i/s © aq 

45: k 2 := i/ 7 © aq 

46: if (z, (5) = (6, -) A z = X7 © fa A j/ 9 © k 2 G P{q then 

47: ChaznQzzezze. ENQUEUE(i/ 7 , Aq, Aq, 7, 4) 
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48: else if 2 = y 9 ® k 2 A xj © k\ G P 6 _ then // (i, (5) — (10, +) 

49: ChainQueue.ENQ\JV\JE(y7 , fci, k 2 , 7, 12) 

50: private procedure Complete^-, Pl, k 2 ,j, Z) 

51: fo,Z - 1 ) := EvalFWD(^, k 2 , j, l - 1) 

52: (y z ,A;i,fe,0 := EvalBWD(^ , fci, fe, j, l) 

53: ForceVal(^_i 0 k 2 , 2 /z, -L, Z) // Always fe, since l G {4, 12}. 

54: (y o ,fci,fo,0) := FvalFWD(^, k 2 ,j, 0) 

55: ( 1 / 7 , ki, k 2 , 7) := EvALFWD(yo, fci, fe, 0, 7) 

56: CompSet :m CompSet U {(yo, k±, k 2 , 0), (2/7, &i, fe, 7)} 

57: // Procedure that adds entries to {P}. 

58: private procedure Force Val (z, z , 5, l) 

59: // When S — A then it’s an adaptation 

60: if z G P/ 5 G P^ then abort 

61: else if £ = — then Pi := Pi U {( z',z )} 

62: else Pi #== Pi U {{z, z')} // S = + or A 

63: private procedure Check(x, y,K) // S does not own such a procedure 
64: return E.E(+, P, x) — y 

65: / / Two procedures that help evaluate forward and backward respectively in IDEM. 
66: private procedure EvALFWD(2/j, k±, k 2 , j, l) 

67: while j ^ l do 
68: if j — 15 then 

69: xi6 := 2/15 ® k 2 

70: y 0 := E.E(-, (A*, fc 2 ),x 16 ) yo ■— F E .F(— , (fci, k 2 ), #16) 

71: J := 0 

72: else 

73: if j is odd then 

74: y j+ 1 := lNNERP(j + 1, +, y d © fe) 

75: else 

76: y j+ i := InnerP(j + 1, +,^- © fci) 

77: j := j + 1 

78: return (yi,ki,k 2 ,l) 

79: private procedure EvALBWD(yj, ki, k 2 , j, l) 

80: while j / l do 

81: if j = 0 then 

82: xie := E.E(+, (fci,fe),2/o) zie := P e .E(+, (fci, fe), 2/o) 

83: 2/15 := xie © k 2 

84: j := 15 

85: else 

86: if j is odd then 

87: 2/j-i := lNNERP(j, yj) © fci 

88: else 

89: yj-i := InnerP(j, © k 2 

90: j := j — 1 

91: return (yi, ki, k 2 , l) 
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Fig. 1 . (left) IDEM 15 with the zones where the simulator detects chains and adapts 
them; (right) IDEM 7 and how the simulator for sequential indifferentiability works. 
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Fig. 2. Systems used in the indifferentiability proof for IDEM 15 . 


3 -tuples ((x 7 , y 7 ), (x s , y 8 ), (x 9 , y 9 ))& P 7 xP 8 xP 9 and ((x' 7 , y' 7 ), (x' 8 , y' 8 ), (x’ 9 , y’ 9 )) e 
P 7 xP 8 xPg such that x 7 ®y 8 ®x 9 = x 7 @y 8 @x ' 9 and y 7 ®x 8 ®yg = y 7 ®x 8 ®y 9 . Tak- 
ing all possibilities into consideration, its overall probability is at most + 

96 4 9^ ~4 . 

— I — — “ 2 ™ 5 an( 4 conditioned on -iBadLockMid, each pair (2/67^10) £ 
Pq x P{q corresponds to at most one tuple ((#7, 2/7), (xg, 2/8)7 (^97 2/9)) £ P7 x 
Pg x Pg s.t. 2/6 = ^ 7 ©l/ 8 ® Xg and 7/7 ® £g 0 2/9 = 10 7 hence 5 enqueues at most 
|Pe| • |Pio| < 4 g 2 “middle” chains (2/7, k\, k 2 , 7 , /). By this, each |P^| is bounded 
to 0(q 2 ), IP. PS | to 5g 2 , and 5 issues at most (5g 2 ) 4 queries to E. Check. 

The rest part of the first transition is very close to [CHK+ 14 ] (and is almost 
the same as [GL 15 a]): for two executions D^ 1 and D^ 2 which share the same 
random primitives (E, P), conditioned on -iBadLockMid, if the first (5g 2 ) 4 calls 
to S. Check in D^ 1 obtain the same answers as the first (5g 2 ) 4 calls to E. Check 
in D^ 2 (Pr > 1 — 1250 g 8 / 2 n ), then D outputs the same in D ^ 1 and D^ 2 . 

Lemma 1. For any distinguisher D which issues at most q queries, it holds: 

(i) |p r [D^ E ’ S ) = 1] - Pr[D^ 2 ^) = 1]| < 

(ii) during the execution with probability at least 1 — 2 2 P , S issues 

at most 2 10 • q 8 queries to E, and runs in time at most 0(q 8 ). 

The most difficult part of the proof is the transition from E 2 to Z3, which 
will be presented in the next paragraph. 

Transition from to P 3 : Non-abortion Argument for S . S' is built with 
the expectation that if it does not abort, then the outputs of U 2 and Z3 are 
indistinguishable; we will see that this intuition is true, so that the first (and 
actually core) step of the transition is to show that the abort probability of S is 
negligible. For this, we introduce several notions and (improbable) bad events, 
then show that if neither of them happens, then S does not abort. 

Random Assignments. Similarly to [LS 13 ], we use the notion random forward 
assignment in set Pi ( random backward assignment in set Pi , resp.) to refer to 
line 62 inside any execution of Force Val(z, P, +, i) (line 61 inside any execution 
of ForceVal(^, P, — , i), resp.), and use the notion random forward (backward, 
resp.) assignment in set ES to refer to any operation sequence P := E.E(+, K, z) 
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(z! := E.E(— , K, z), resp.) and then adding 2 and z r to ES. We also use random 
assignments to indifferently refer to the forward or backward case. 

Partial Chains. In this paper, the partial chains are characterized by 4-tuples 
(yi, fci, &2 1 i) G {0, l} n x {0, l} n x {0, l} n x {0, . . . , 15}. Besides this notion, we 
borrow two helper functions valf and val f from previous works: w.r.t. the values 
in the given sets ES and {P}, valf and val f take a partial chain as input, 
and evaluate forward and backward respectively (wrap around through ES if 
necessary) until obtaining the corresponding input value xi to Pi and input value 
yi to P l -1 respectively, or the evaluation is blocked by some missed entry (in the 
sets), and return xi and yi respectively in the former case while _L in the latter 
case. Based on valf and valf , we borrow (and redefine) the notion of equivalent 
partial chains: w.r.t. {P} and E.ES, two partial chains C = (y^ fci, i) and 
D = (yj,ki,k 2 ,j) (with the same keys) are equivalent (denoted C = D) if 
yi = val^ ( D ) or Vj = val J (CO - 5 

Bad Events, and Non- abortion. A random answer from P or E is bad if it col- 
lides with some value relevant to the “active” chains. To specify such “active” 
chains, we define a notion history for partial chains Chi from the sets ES and 
{P7, Pg, Pg} at the moment where the random answer is drawn: Chi is the union 
of two sets SChi and AiChi, where £Chi includes all the tuples (7/0, fci, fc2, 0) 
with ((fci, £2), 2/0) £= ES + , and MChi includes all (7/7, fci, k<i, 7) with y 7 E Pf~ , 
x$ = V7 ® fc2 G Pg", and Xq = Pf(xg) ® £q E Pg". By the complexity analysis, 
conditioned on ^BadLockMid, \Chi\ < 5 q 2 + (2 q) 3 < 13 q 3 . 

We then list the bad events (more precisely, their ideas). Due to space con- 
straints, we have to defer their formal definitions to the full version [GL15b]. 

- Bad Hit Adapt: an answer from P collides with a previous adapted value; 

- BadE: an answer from E collides with a value in Pi or Pi 5 xored the key, i.e. 
E.E(— , (ki,k 2 ),xi 6 ) © fci G Pf, or E.E(+, (ki,k 2 ),y 0 ) © k 2 G P{~ 5 ; 

- BadP: extension of some chain C meets an old P-tuple after a random assign- 
ment in {P} with the same direction, i.e. 3C E Chi s.t. for more than one 
value i, valf(C) (valf(C), resp.) differs after a random forward (backward, 
resp.) assignment in {P} from before the assignment; 

- BadlnvP: some chain C extends after a random assignment in {P} with the 
opposite direction, i.e. 3C E Chi s.t. for some value i, valf (C) ( valf (C ), 
resp.) differs after a random backward (forward, resp.) assignment in {P} 
from before the assignment; 

- BadMidP: a random assignment in P7, Pg, or Pg creates a new 5-tuple (y§, 
(x 7 , y 7 ), (x 8 , ys), (x 9 , y 9 ), x w ) G P 6 “ x P 7 x P 8 x Pg x P ^ 0 such that y 6 © x 7 = 
y 8 © xg and y 7 © x 8 = y 9 © x 10 ; 

- Bad lyCol I ide (a term from [CHK+14]): two chains C, D E Chi that are not 
equivalent suddenly satisfies valf(C) = valf(D) after a random assignment. 

The overall probability (the event BadLockMid included) cumulates to — 


Note that if C — D then both yi = val i ( D ) and yj a== valj (C). 


5 
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We call a pair of primitive (E,P) a good £ 2-tuple if none of the bad events 

above (BadLockMid included) happens during the execution D e<2 ^ E ^ E and 
call D E2 with good ^-tuples good £ 2 executions. During good £2 executions, 
S never aborts due to calls to ForceVal(x^, +, i) /F orceVal (z*, , i), as 

otherwise Bad Hit Adapt happens. We then proceed to argue that S never aborts 
due to calls to ForceVal (xi,yi, _L, l ) (i.e. adaptations: l G { 4 , 12 }), to complete 
the non- abort ion argument. 


Lemma 2 . In a good execution D^ (i) 2 , before any call to ForceVal (#/,?//, _L, Z) 
(l G { 4 , 12 } ), xi £ P/“ Ayi f Pj~ must hold. 

Proof. The proof flow is very similar to [CHK+ 14 ], while some ideas slightly 
deviate. Let’s sketch the flow: wlog consider a call Force Val (£4, 7/4, _L, 4 ), and 
suppose that it is made during an execution of Complete(C, 4 ). We argue that 
val 4" (C) f P4” right before the call to ForceVal (#4, 7/4, _L, 4 ), and the argument 
for val^(C) f P 4 _ is similar by symmetry. The ideas are as follows: 

First, before C is enqueued, val%(C) = _L (this implies val±{C) = _L f P/): 
if C = ( yo , fci, & 2 , 0 ) is enqueued by InnerP( 2 , +, X2), then val% ( C ) = _L is clear; 
if C = (2/7, Aq, £2, 7 ) is enqueued by InnerP( 6 , — , y§), then if val^iC) 7^ _L, 
a chain (3/0? 0 ) equivalent to C must have been previously enqueued and 

completed due to the call to InnerP(2, +, val£(C)), and C would have been 
in CompletedSet when C is dequeued, as a consequence the purported call to 
ForceVal (X4, 2/4, T, 4 ) would not happen. 

Second, if val 4" (C) G P± when C is dequeued, it can only be that for another 
chain D enqueued before C is enqueued and dequeued after C is enqueued, it 
holds val^(D) = val±(C) ^ 1 so that val±{C) was added to P± during P’s 
completion. 

Then, we argue that val± ( D ) = val 4" (C) 7^ _L is not possible for any such 
chain D, so that when C is dequeued, either val±(C) = _L, or val±{C) / 1 A 
val±(C) f P4". To argue about this, we exclude the possibility for each of the 
following cases: 

(i) if val^iC) 7^ val^iD) at some point, then val^(C) 7^ val^(D). Otherwise, 
consider the last assignment before val%(C) = val^(D) 7^ _L holds. This 
assignment happens earliest right before C is enqueued, at which point 
both C and D have been in CH (by construction and definition). Then: 

- it cannot have been in ES, otherwise BadE happens; 

- it cannot have been a random backward assignment in {P}, otherwise 
BadlnvP happens; 

- it cannot have been a random forward assignment in {P}, otherwise 

BadlyCollide happens; 

- it cannot have been due to a previous adaptation, since by construction, 
when C is enqueued, all the chains in ChainQueue are to be adapted 
at P4 which is the same as C, so that it cannot be that val% (C) = _L or 
vallf (D) = _L due to a missed entry in Pi 2 which is later added by an 
adaptation in this period. 
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Then a similar discussion further establishes val±(C) 7^ val^(D); 

(ii) if val^ (C) = val 2 (D) 7^ _L while valf (C) 7^ val 3" (D) at some point, then 
similarly to Case (i), val^{C) 7^ val±{D) will hold; 

(iii) if val^iC) = val^ (D) 7^ _L and val^ (C) = val^(D) 7^ _L, then val^(C) 7^ 
valf ( D ) otherwise C = D and C G CompletedSet when C is dequeued. 

Finally, after C is dequeued, if val^ (C) = _L, then since -iBadP, it can only be 
changed non-empty by a random forward assignment in P 3 which occurs during 
the completion of C, after which valf(C) £ P+. These complete the proof. □ 

The Rest Part. During D^ 2 , if S does not abort, then the answers are consistent 
with some A3 executions. By a randomness mapping argument [CHK+ 14 ], the 
advantage of D in distinguishing U2 and A3 is bounded. 


Lemma 3 . For any distinguisher D which issues at most q queries, it holds: 


Pr [D^ 3 ( IDEMl 5 T) 


1] - p r [D E2( - E ’ S '> 



2 14 • q 6 
2 n 


4 Seq-indifferentiability for 7-Round IDEM 

According to [LS 13 , ABD+ 13 ], there is a seq-distinguisher for SEM 3 . Consider 
IDEM6. If we fix the key k% to an arbitrary value, then the construction is 
reduced to a 3 -round single- key Even-Mansour. By this, a seq-distinguisher for 
IDEMe is easily obtained. 

It is natural to ask whether the additional n-bit key offers more freedom to 
the adversary and enable to attack more than this trivial 2 x 3 rounds. The 
second main result - also the main theorem of this section - provides a negative 
answer, and is as follows: 

Theorem 2. The 7 -round Even-Mansour cipher IDEM7 from seven indepen- 
dent random permutations P = (Pi, . . . , P7) and two n-bit keys (Aq,/^) alter- 
natively xored is strongly and statistically ( q,cr,t,s)-seq-indifferentiable from E ; 
where a = q 3 , t = 0(q 3 ), and e < = O(^). 

Proof The proof is much simpler than that of Theorem 1 , since there is no 
recursive chain completion. In the following, we first present the simulator, then 
sketch the proof. The full proof is deferred to the full version [GL 15 b]. 

Simulator for IDEM7. To make a distinction from the notations used in Sect. 3 , 
we denote by <S E,P the simulator for IDEM7 with access to E and P. Similarly to 
S E,P , <S E,P also offers an interface P(i, S, z ) where (i, S, z) G { 1 , . . . , 7 } x {+> — }x 
{0, l} n and maintains a set Pi for each i to keep the already defined pairs of 10. 
The other notations P^ + , P~ , and \Pf are all similar to those introduced in the 
context of IDEM15. <S E,P uses an additional set ES to maintain the history of 
its queries to E, which is similar to the set of P E introduced in Sect. 3 . We also 
use the notations ES + , ES~ , and \ES\ similar to Sect. 3 . 
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Upon a query to <S EP .P(i, S, z), <S E,P calls an inner procedure <S E,p .P m , 
and <S EP .P m answers with Pf(z) if x E P/, or queries P.P(i, 5, z) to obtain the 
answer z' and adds z and z' to Pi if z' Pf while aborts otherwise. 

The chain completing mechanism of <S E,P is much simpler than that of S E,P , 
and is somehow close to that appeared in [CS15] : <S E,P completes the potential 
partial chains upon receiving a new query <S E,p .P(i, S, x) with i E {3,4,5}. 
More clearly, when the query is of the form <S E,p .P(3, +,x), <S ep .P(4, — , 2/), or 
<S e,p .P(5, +, x), <S E,P considers all newly created tuples (# 3 , x±, x$) E P 3 + x? 4 + x 
P 5 + , and computes k\ := P^(x 4) ® £5, := P% (£3) ® £4. <S E,P then evaluates 

in IDEM7 both backward and forward until obtaining the corresponding yj and 
#7, that is, computing the following values by calling <S E,P .P m and querying 
E, in the order: (1) y 2 := x 3 ® fci; (2) y 1 := 5 E,p .P”(2, -,2/2) © k 2 ; (3) y 0 := 
«S E ’P.P* n ( 1 , yi)(Bki; (4) y 7 := E.E(+, (ki, k 2 ), y 0 )®k 2 -, (5) x 6 := P^(x 5 )®k 2 ; 
(6) x 7 := <S E,p .P m (6, +,xq) © k\. <S EP finally aborts if x 7 £ P 7 or y 7 £ Pf , 
otherwise adds (£7,2/7) to P7 as a newly defined pair of 10. 

When the query is <S ep .P( 3, — , y), <S ep .P( 4, +, x), or <S E,p .P(5, —,2/), <S E,P 
considers all newly created tuples (£3, £4, £5) E P% x P 4" x Pg~, computes fci 
and &2, evaluates in IDEM7 both forward and backward until obtaining the 
corresponding x\ and 2/1 , and finally adds (£i,yi) to Pi or aborts if x\ E P{*~ or 
yi E Pi . The strategy is illustrated in Fig. 1 (right). 

To simplify the reasoning, we introduce a modified simulator T E,P , which is 
obtained by embedding two early abort conditions into <S E,P : 

(i) when a chain C is to be adapted at Pi (P7, resp.), right after the assign- 
ment (lines 13 or 16 in the code below) inside the call to P m which led 
to C being detected, if the value 7/2 (^6> resp.) corresponding to C has 
been in P 2 _ (P^, resp.), then T aborts. This is captured by the procedure 
CheckFreeBuffer; 

(ii) right after an assignment in P3, P4, or P 5 (lines 13/16), T aborts if the assign- 
ment creates a “lock” in the middle three rounds: for (i,j) E {(3,4), (4,5)}, 
if 3(xi,yi), (a;-, 2 /-) E Pi and (£/,%), (£',y') E Pj such that £*©% = £-0y' 
and yi © Xj = y[ ® £' . This is captured by the procedure CheckLock. This 
situation is harmful for the procedure CompChain in some cases. 

With all the above in mind, we have the pseudocode of S and T as follows. Note 
that the underlined lines only exist in T (say, S does not early abort). 

Intermediate System I7 2 . Denote by Z{(E, <S E,P ) the simulated system, and by 
U3 (IDEM p ,P) the real system. As a quite standard first step, we introduce 
an intermediate system I7 2 (IDEM^ ,T E,P ), in which the cipher IDEM7 calls 
the interfaces of T to compute (as done in [MPS12, CS15]). The three systems 
involved in this proof are depicted in Fig. 3. 

Complexity of S/T. By construction, for i E {3, 4, 5}, |P^| can be enlarged by at 
most 1 only if S/T receives a query P(i, 5, •). Hence for any seq-distinguisher D 
of total oracle query cost at most g, S/T completes at most | ^3 1 * | P4 1 * | ^5 1 < g 3 
chains, and queries E at most q 3 times (say, \ES\ < g 3 ). 
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Simulator S E,P : 

Variables: Sets {P} = {P u . . . , P 7 } 
public procedure P (i,S,z) 
return P in (i,S,z) 
private procedure P xn (i,<5,z) 
if z £ then 
P :=P.F %S,z) 

if P G Pf then / / when i — 1,7 

abort 

CheckFreeBuffer(z, 5, P) 
if 5 = + then 

CheckLock(z,z,P) 
Pi:=PiU{(z,z')} 
else // 5 = — 

CheckLock(z, P, z) 
P z :=P z U{(z',z)} 
if z = 3 A 5 = + then 
forall (^4, X5) G P 4 + X P 5 + do 
CompChain(z, £4, £ 5,3, 7) 

private procedure CompChain(x3 


Simulator T E,P : 
and PS, initially empty 


20: else if z = 4 A <5 = + then 

21: forall (x 3 ,x 3 ) G P3" x Pg" do 

22 : CompChain(x 3 , 2:, x 5 , 4, 1) 

23: else if z = 5 A £ = + then 

24: forall (£3, £4) G P3" x P/ do 

25: CompChain(x 3 , X 4 , z, 5, 7) 

26: else if i = 3 A S = — then 

27: forall (^4,^5) £ P/ x Pg" do 

28: CompChain(P, £ 4 , z 5 ,3, 1) 

29: else if z = 4 A 5 = — then 

30: forall (^3,^5) G P^ x Pg" do 

31: CompChain(x 3 , P, z 5 ,4, 7) 

32: else if z = 5 A 5 = — then 

33: forall (£3, £4) G P^ x P/ do 

34: CompChain(x 3 , X 4 , P, 5, 1) 

35: return Pf(z) 

, X4 5 3^5 , Z, 0 


Pi 1= 

P 4 + (x 4 ) © £5 

50 

Pi := 

Pi u (On, 2 / 1 )} 

k 2 := 

P 3 + (x 3 ) ©x 4 

51 

else / / 

' / = 7 

if l = 

1 then 

52 

2/2 := 

x 3 © fci 

Xq : 

= P 5 + (^5) © k 2 

53 

2/i := 

P in (2, — , 2 / 2 ) © k 2 

xr : 

= P m (6,+,x 6 )©/ci 

54 

2/o := 

P in (l? 5 2 /i ) © ki 

: 

= P' n (7,+,x 7 )©/c 2 

55 

x 8 : = 

E.E(+, (Pi, P 2 ), 2/o) 

2/o : 

- E.E(— , (Pi , P 2 ), 2 c 8 ) 

56 

PS := 

= PS U {( 2 / 0 , x 8 , (Pi,P 2 ))} 

PS 

:= PS U {( 2 / 0 , aj 8 , (Pi, P 2 ))} 

57 

2/7 := 

x 8 © k 2 

£1 : 

= 2/o © Pi 

58 

Xq : = 

P 5 + (x 5 ) © k 2 

2/2 : 

= x 3 © ki 

59 

x 7 : = 

P' n (6,+,x 6 )©P 

2/i • 

— P* n (2, — , y 2 ) © k 2 

60 

if x 7 

G Py" V y 7 G Pp then 

if xi G Pi V 2/1 G P-l then 

61 

abort 

abort 

62 

Pi := 

P 7 u {(x 7 ,y 7 )} 


private procedure CheckFreeBuffer(z, 5, P) 
if (z, (5) = (3, +) A 3(x 4 , 2 / 5 ) G P 4 + x P 5 _ s.t. P © X 4 © 2/5 G P 6 + then 


abort 

else if (z, ( 5 ) = ( 4 , +) A 3 (x 3 , £5) € P 3 + x P 5 + s.t. x 3 ® P ® x 5 G P 2 _ then 
abort 

else if (z, 5) = (5, +) A 3 ( 2 / 3 , £ 4 ) G P 3 _ x P/ s.t. y 3 © X 4 ® P G P 6 + then 
abort 

else if (z, 5 ) = ( 3 , — ) A 3(2/4, £5) G P 4 _ x Pg" s.t. P © 2/4 © £5 G P 2 _ then 
abort 

else if (z, ( 5 ) = ( 4 , -) A 3(2/3, 2/5) G P 3 _ x P 5 _ s.t. 2/3 © P ® 2/5 G P 6 + then 
abort 

else if (z, (5) = (5, — ) A 3(^3, 2/4) G P 3 + x P 4 _ s.t. £3 © 2/4 © P G P 2 _ then 
abort 

private procedure CheckLock(z, x, y) 

if i = 3 A 3 ((x 3 ,2/3), (^ 4 , 2 / 4 ), OP, 2 / 4 )) e P3 x P 4 x P 4 
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78 : s.t. x ® y\ — £3 ® 2/4 A y ® x'4 = 2/3 © £4 then abort 

79 : if i = 5 A El ((£5, 2/5), (£4,2/4), (£4,2/4)) G P5 x P 4 x P 4 
80 : s.t. X4 © 2/5 = £4 © y A 2/4 0 £5 = y'4 ® £ then abort 

81 : if i = 4 A 3 ((aj 3 ,y 3 ), (£3,2/3), (£4,2/4)) G P 3 x P 3 x P 4 
82 : s.t. £3 ® 2/4 = £3 ® y A 2/3 ® X4 = y'3 ® x then abort 

83 : if i = 4 A =l((£ 5 , 2/5), (£5,2/5), (£4,2/4)) G P5 x P 5 x P 4 
84 : s.t. £4 0 2/5 = £ 0 2/5 A 2/4 0 £5 = 2 / 0 £5 then abort 



Fig. 3. Systems used in the seq-indifferentiability proof for IDEM 7 . The number in red 
and italic illustrates the order of the queries/actions (of the sequential distinguisher) 
(Color figure online). 


The running time of S is clearly dominated by the executions of CompChain, 
the number of which is 0(q 3 ). Therefore, S runs in time 0{q 3 ). 

Indistinguishability of Outputs. We first upper bound the abort probability of 
T. Consider the two early abort conditions first: 

(i) The overall probability that T aborts during CheckFreeBuffer is at most 

2 q 6 . 

2 n — g ’ 

(ii) The overall probability that T aborts during CheckLock is at most 2 n_ q ; 
Then the two types of main abortions of T are as follows: 

(i) a random answer from Pi or P 7 collides with a previously added adapted 
value. The overall probability is at most 2 ^- 2 q 3 ? 

(ii) T aborts due to adaptations. The overall probability is at most 2 nl 2q z ? this 
is obtained by carefully analyzing each case. A key point is that the buffer 
rounds ensure that any two chains completed during the same call to P m 
will diverge at the adaptation round - the case is slightly similar to IDEM 15 . 

These cumulate to (assuming q 3 < ^-). For a tuple (E,P), if T does not 
abort in ^(IDEM^ ,T E,P ), then S does not abort in IE^(E, <S E,P ) ; and then 

97^6 26 6 6 

the final bound - = yr + ifn is obtained by a randomness mapping argument, 
where the statistical distance is due to \ES\ < q 3 random values from E. □ 
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5 Conclusion 

As a first step towards understanding the security of iterated Even-Mansour with 
key-length larger than the block-size, this work analyzes (seq-) indifferentiability 
of Even-Mansour with two independent round-keys alternatively xored, and 
proves that 7 rounds is necessary and sufficient to achieve sequential indiffer- 
entiability while 15 rounds is sufficient to achieve full indifferentiability. 
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Abstract. In the past few years, lightweight cryptography has become 
a popular research discipline with a number of ciphers and hash func- 
tions proposed. The designers’ focus has been predominantly to mini- 
mize the hardware area, while other goals such as low latency have been 
addressed rather recently only. However, the optimization goal of low 
energy for block cipher design has not been explicitly addressed so far. 
At the same time, it is a crucial measure of goodness for an algorithm. 
Indeed, a cipher optimized with respect to energy has wide applications, 
especially in constrained environments running on a tight power/energy 
budget such as medical implants. 

This paper presents the block cipher Midori (The name of the cipher 
is the Japanese translation for the word Green.) that is optimized with 
respect to the energy consumed by the circuit per bt in encryption or 
decryption operation. We deliberate on the design choices that lead to 
low energy consumption in an electrical circuit, and try to optimize each 
component of the circuit as well as its entire architecture for energy. 
An added motivation is to make both encryption and decryption func- 
tionalities available by small tweak in the circuit that would not incur 
significant area or energy overheads. We propose two energy-efficient 
block ciphers Midori 128 and Midori64 with block sizes equal to 128 and 
64 bits respectively. These ciphers have the added property that a circuit 
that provides both the functionalities of encryption and decryption can 
be designed with very little overhead in terms of area and energy. We 
compare our results with other ciphers with similar characteristics: it 
was found that the energy consumptions of Midori64 and Midori 128 are 
by far better when compared ciphers like PRINCE and NOEKEON. 


Keywords: Lightweight block cipher • Low energy circuits 


1 Introduction 

The field of lightweight cryptography has gone into overdrive as evident from the 
number of cipher proposals that have emerged in the past few years, like CLEFIA 
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[32], KATAN [13], KLEIN [18], LED [19], PRESENT [11], Piccolo [31], PRINCE [12], 
SIMON/SPECK [6] to name a few. However, the Advanced Encryption Standard 
(AES) [16] still remains the de-facto standard when it comes to practical light- 
weight encryption. The past few years have seen several low- power /area archi- 
tectures for AES being reported in literature [17,27,30]. However, there has been 
little work that goes on to determine the design choices that lead to the most 
energy-efficient architecture. There are many parameters that contribute to the 
efficiency of a given lightweight design, with area, power, throughput and energy 
being the foremost among them. Power and energy, are correlated parameters, 
as energy is essentially the time integral of power, and power is equivalent to 
the energy consumed per unit time or simply the rate of energy consumption. 
Energy consumption, thus, is a measure of the total work done by voltage source 
during the execution of an operation. Hence, in many ways, energy rather than 
power may be a more relevant parameter to measure the efficiency of a design. 
Serial architectures of any block cipher that reduce the width of the datapath 
and reuse components, have a smaller power footprint than round based imple- 
mentations in which the data path is equal to the block length of the cipher. 
However, serial implementations usually have high latency, that is, they take 
much longer to compute the result of an encryption operation than their round 
based counterparts, and as a result may end up consuming more energy. There- 
fore, there is no guarantee that low power architectures would necessarily lead 
to low energy architectures and vice versa. 

In [5,21], an evaluation of several lightweight block ciphers with respect to 
various hardware performance metrics, with a particular focus on the energy cost 
was done. A formal model for energy consumption in any r-round unrolled block 
cipher architecture was proposed in [3] . However these papers do not specifically 
outline design choices that lead to energy-efficient designs. 


1.1 Our Contributions 

In this paper, we at first try to identify design choices that are energy-efficient 
and the related tradeoffs that are involved as a result of it. We throw some 
light at the design considerations that govern low energy circuits, and look at 
several factors like clock frequency, architecture, loop unrolling and lay down 
some general thumb rules that help in optimizing for energy. Then, we choose 
components specifically tailored to meet the requirements of low energy design. 
In particular, we develop energy-efficient linear layers and non-linear layers. 

We use 4x4 almost MDS binary matrices which are more efficient than 
4x4 MDS matrices in the terms of area and signal-delay. Note that the branch 
numbers (the smallest nonzero sum of active inputs and outputs of the matrix) 
of MDS and almost MDS matrices are 5 and 4, respectively. However, due to 
a smaller branch number, ciphers employing almost MDS matrices are likely 
to require the more number of rounds to guarantee its security against several 
attacks. To address this issue, we propose optimal cell-permutation layers which 
are aimed at improving diffusion speed and increasing the numbers of active 
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S-boxes in each round with low implementation overheads. Our optimal cell- 
permutations drastically improve the minimum number of differentially /linearly 
active S-boxes in each round, and achieve faster diffusion compared to Shift Row- 
type permutation. We construct a lightweight and small-delay 4-bit S-box by 
focusing on the dependency of the computation in S-boxes. The signal delay in 
our S-boxes is 1.5 times and twice faster than those of PRINCE and PRESENT, 
respectively. Since the S-box layer is one of the most critical and expensive 
operations of the cipher, our new S-boxes sufficiently contribute to low energy 
consumptions. 

Combining those new constructions, we design a family of low energy block 
ciphers Midori which is composed of two variants: Midori 64 and Midori 128. These 
provide the functionality for both encryption and decryption with minimal area 
and energy overhead. The two variants support a 128-bit secret key and a 64/128- 
bit block, respectively. Security wise, Midori64 and Midori 128 do not claim 
related, known and chosen- key security as it is not relevant in our target applica- 
tion. Using the STM 90 nm standard cell library, both these ciphers consume less 
than 1.89 pj / bit encrypted, which is by far better when compared ciphers like 
PRINCE and N0EKE0N [16]. These ciphers are particularly useful for applications 
that run on tight energy budget, e.g. active RFID tags, sensor nodes, medical 
implants and battery operated portable devices. 


1.2 Organization of the Paper 

In Sect. 2, we look at some design considerations that help to minimize energy 
consumption in block cipher circuits. In Sect. 3, we outline the algorithmic spec- 
ifications of the Midori 128 and Midori64 ciphers. In Sect. 4, we explain our design 
decisions vis-a-vis the observations of Sect. 2. In Sect. 5, we outline the security 
analysis of the ciphers. Section 6 contains implementation results of our cipher 
in hardware using the standard cell library of the STM 90 nm logic process. 
Section 7 concludes the paper. 

2 Design Considerations for Low Energy 

For any given block cipher, three factors are likely to play a dominant role in 
determining the quantity of energy dissipated in the circuit: 

(a) Frequency of the Clock used to drive the circuit, 

(b) Architecture of the individual components, 

(c) Unrolling round functions in the circuit. 

We will try to understand the significance of each of these parameters in the 
context of energy consumption. Let us start with clock frequency. Two compo- 
nents characterize the amount of energy dissipated in a CMOS circuit : 
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- Dynamic dissipation due to the charging and discharging of load capacitances 
and the short-circuit current, 

- Static dissipation due to leakage current and other current drawn continuously 
from the power supply. 

The total energy dissipation for a CMOS gate can be written as 


E ga te — Eioad T E sc + Ei ea j^ age 

The quantity Ei oa d is the energy dissipated for charging and discharging 
the capacitive load of a gate when output transitions occur. The energy 
dissipated per 0 — » 1/1 — » 0 transition is given as 

rt rt 7 pVdd 1 

E= vidt= vC L — dt = C L / vdv = -C L Vp D . 

J o J o Jo z 

The energy due to the short-circuit current, E sc is dissipated in a CMOS gate, 
when during a transition both the n and the p-transistors are on for a short 
period of time. The energy due to leakage currents E[ ea i ~ age is rather small, 
and is mainly caused due to the sub-threshold leakage current, which is the 
drain-source current in a CMOS gate when the transistor is OFF. This figure 
is becoming increasingly important as the technology is scaling down making 
the sub-threshold leakage more significant. However as pointed out in [3,21], 
the effect of the leakage energy at high clock frequencies is minimal. As such, 
energy becomes a metric which is a measure of the total switching activity of 
a circuit during the process. For sufficiently high frequencies, the energy con- 
sumption required to compute an encryption/decryption operation is essentially 
independent of frequency of operation. In our experiments, for circuits imple- 
mented using the standard cell library based on the STM 90 nm low leakage 
process, at frequencies higher than 1 MHz, leakage energy is usually less than 
1 % of the total energy dissipated in the circuit. 

To understand the significance of the other parameters we performed the 
following experiments. Consider a case in which two Rijndael S-boxes are placed 
one after the other in a circuit as shown in Fig. 1. The signals to the input of 
the first S-box, the second S-box, and the output of the 2nd S-box are named 
SlxD, S2xD and S3xD respectively. Note that, analyzing this situation is particu- 
larly useful for understanding the energy consumption trends of unrolled designs 
where logic blocks are placed sequentially one after the other. 

Let us assume that the signal SlxD comes from an 8-bit register, so that it 
“cleanly” switches between successive byte values, i.e. all the bits of SlxD make 
logic transitions at the same point of time which is usually the rising clock edge 



Fig. 1 . S -boxes placed sequentially 
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Fig. 2. The signals SlxD, S2xD, S3xD 

for synchronous circuits. The signal S2xD will switch between various values in 
a given time interval 0 — ► r^, before settling down to a stable value. The value 
Td which is the delay experienced by the signal SlxD usually depends on the 
cell library and the architecture adopted to implement the S-boxes. Another 
parameter dependent on the logic process and architecture of the S-box is the 
switching activity of S2xD which can be informally defined as the number of 
logic transitions made by this signal in the period 0 — > r^. 

The second S-box 62 , sees this signal S 2 xD, which is switching between var- 
ious values in the time interval 0 — > r^. Therefore, the switching activity of S 2 
is actually at least double that of S\ , as it would continue switching for another 
Td before producing a stable signal. Figure 2 provides an example in which, the 
three signals for the pair of Rijndael S-boxes (implemented using the Canright 
[14] architecture in the standard cell library of the STM 90 nm logic process, at 
10 MHz) are shown. The synthesis for each S-box was done separately, so that 
the synthesis tool would not group together gates from the first and the sec- 
ond S-box in order to save area. Since the energy consumption of a logic block 
depends on the switching activity of all its nodes, the S-box S 2 should naturally 
consume more energy than Si. Again the exact energy consumed by S 2 relative 
to Si depends on factors like 

(a) the logic process and hence the value of r^, 

(b) the architecture of the S-box and hence the amount of “extra” switching 
experienced by S 2 and 

(c) the algebraic structure of the S-box, i.e. its component Boolean functions. 

The extra switching activity would be proportional to the average number of 
gates that undergo a 0 — » 1/1 — >0 transition during the period Td — > 2 Td 
(the average is typically taken over all possible transitions of the signal SlxD). 
Similarly if a third S-box S 3 were placed after S 2 , then too it would experience an 
increase in switching activity relative to S 2 that would depend on the average 
number of gates switched in the period 2 — > 3 r^. The increase in switching 
activity of S 3 over S 2 is likely to be roughly the same as that of S 2 over Si, 
since the number of gates in S 2 that switch in Td — > 2 Td and those in S 3 between 
2 Td — > 3 Td when averaged over transitions of SlxD, is likely to be same. 

And so if it so happens that Si , S 2 and S 3 drive the same amount of capacitive 
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Fig. 3. Energy per cycle Ei in i th S-box Fig. 4. Energy !? n required to compute 

Si S 10 (x) using n S-boxes 


load, the difference between the energy consumed between 5 2 and Si is likely to 
be the same as between S 3 and 5 2 . 

Taking these ideas forward, if we connect a series of n S-boxes sequentially, 
the energy consumed by each S-box in a given period of time is likely to be more 
than the previous S-box, as the switching activity of the S-boxes are likely to 
increase from the first to the last. We tested three different architectures for the 
Rijndael S-box. The first is the Canright [14] architecture which is acknowledged 
to be smallest known implementation in terms of gate area. The second is the 
Look-up Table (LUT) based architecture as synthesized by the Synopsys Design 
Compiler. The LUT architecture, while larger than the Canright architecture in 
terms of area, is much faster in terms of signal delay from the input to output 
port. The third is a Decoder-Switch-Encoder (DSE) based architecture [7], which 
is optimal in terms of power/energy consumption. Over the years there has 
been much research on low power Rijndael S-boxes [28,34], but the DSE based 
architecture is widely believed to be most power /energy-efficient on account of its 
unique architecture. The 8-bit input is first decoded to a set of 256 wires. The 
S-box functionality is achieved by a shuffling of wires after which the output 
is produced by an encoding of the 256 shuffled wires (i.e. the inverse of the 
decoding process). The entire circuit can be constructed by AND/NAND gates, 
which have very low switching probability and since the S-box functionality is 
provided by wire shuffling, all 8-bit S-boxes can be constructed in this manner. 
The architecture offers very low switching per change of input bit: a maximum 
of 25 % of the gates switch when one of the input bits is flipped. 

We connected 10 instances of the S-box constructed using the Canright archi- 
tecture (using the standard cell library of the STM 90 nm logic process) sequen- 
tially and used the Synopsys Power Compiler to estimate the energy consumed 
per clock cycle Ei in each of the successive S-boxes Si at a clock frequency of 
10 MHz. We repeated the same experiment for the LUT and DSE based S-boxes. 
The results can be seen in Fig. 3. It can be seen that the successive instances of 
the LUT based S-box which has a delay of around 2.1 ns consumes much less 
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energy as compared to the Canright S-box which has a delay of around 2.9 ns. In 
both the LUT and Canright architectures, the switching activity in the circuit is 
roughly proportional to the signal delay across the input and output ports. This 
is however not the case for DSE S-box, which although has a delay of around 
2.3 ns, experiences much lower increase in successive values of Ei because the 
total switching activity in the delay period is much lower. 

The above analysis is particularly relevant due to two reasons. The first 
pertains to the structure of especially SPN based ciphers, in which each round 
typically consists of a substitution, a linear layer and a key addition placed 
sequentially. A substitution layer with low switching activity and signal delay 
ensures that the linear layer consumes less energy. Similarly a linear layer with 
similar characteristics ensures that any circuit placed after it consumes less 
energy. The second pertains to the consideration of round unrolled circuits. An r- 
round unrolled circuit for a block cipher is one in which, the circuit computes the 
results of r successive round functions in a single clock cycle. So if the block cipher 
specification calls for N executions of the round function, an r-round unrolled 
circuit will compute the result of the encryption operation in |~^] cycles. An 
r-round unrolled architecture is constructed by placing the circuits for r round 
functions sequentially, followed by a register. The above analysis suggests that 
any multiple round unrolled circuit is unlikely to be efficient in terms of energy 
consumption. In the above example, using the LUT based S-box, computing the 
result of two S-box operations (i.e. S(S(x))) over 2 cycles costs 2 * 1.88 = 3.76 
pj. Computing the same over one cycle by sequential placement of 2 S-boxes 
will cost 1.88 + 3.91 = 5.79 pJ. Similarly computing three S-box operations 
over three cycles takes 5.64 pj, whereas the same over one cycle would take 
1.88 + 3.91 + 6.40 = 12.39 pJ. Figure 4 shows the cumulative energy cost f2 n of 
computing S 10 (x) using a sequence of n S-boxes (i.e. in ^ cycles), for different 
values of n. It can be seen that, irrespective of the architecture of the S-box, the 
energy consumption is optimal for n — 1, i.e. computing the operation over 10 
cycles using a single S-box, even if this involves updating the register 10 times 
in the process. 

2.1 S-Box: 4-Bit Vs 8-Bit 

In light of the above analysis, it is clear that a design using a 4-bit S-box is 
more efficient in terms of energy consumed per cycle than a design using an 
8-bit S-box. This is primarily due to the fact that a 4-bit S-box will typically 
have a lower signal delay as compared to an 8-bit S-box. However 8-bit S-boxes 
offer higher non-linearity and lower values of the DP/LP co-efficient, and so in 
order to sustain similar security margins, a design using a 4-bit S-box will typi- 
cally need more executions of the round function. To put things, in perspective 
we performed the energy evaluation of the circuit of the SPN round function 
(with blocksize equal to 128 bits) in which we experimented with two differ- 
ent substitution layers, one having sixteen 8-bit S-boxes and the other having 
thirty two 4-bit S-boxes. The Rijndael MixColumn was used in both cases, and 
the STM 90 nm cell library was used to synthesize the circuits. For this purpose 
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Table 1 . A comparison of energy per cycle for round functions constructed with (A) 
16 8-bit S-boxes, (B) 32 4-bit S-boxes. 



S-box 

Delay in S 

Energy per cycle 



(ns) 

(pj) 

A 

DSE (8-bit) 

2.25 

14.00 


Rijndael(LUT) 

2.10 

38.88 


mCrypton 

1.59 

13.20 


Whirlpool 

1.33 

16.38 

B 

DSE (4-bit) 

0.81 

7.92 


PRINCE 

0.36 

4.87 


PRESENT 

0.45 

6.18 


four different 8-bit S-boxes were chosen. Apart from the LUT and DSE based 
Rijndael S-boxes, we chose the S-boxes used in mCrypton [24] and Whirlpool 
[4]. Unlike AES, these S-boxes can be functionally defined in terms of smaller 

4- bit S-boxes, and so can be implemented efficiently in hardware. Additionally 
we chose three 4-bit S-boxes: the generic DSE based S-box (note that since the 

5- box functionality is provided by a wire shuffle, all DSE S-boxes will have same 
energy consumption), and the S-boxes used in PRINCE [12] and PRESENT [11]. 

Table 1 reports the energy per cycle figures at a frequency of 10 MHz. It can 
be seen that the DSE architecture is not as effective as energy saving measure 
for 4-bit S-boxes. It is also interesting to note that from the point of view of 
energy 4-bit S-boxes out performs their 8-bit counterparts by a ratio of around 
2:1. Thus, the use of 4-bit S-boxes seems to be an efficient configuration even if 
the number of rounds in the encryption algorithm has to be increased in order 
to maintain security margins. 

2.2 Feistel Vs SPN and Complex Vs Simple Round Function 

As far as designing lightweight ciphers is concerned, both SPN and Feistel archi- 
tectures have their respective advantages and disadvantages. Feistel structures 
(e.g. TWINE [33], Piccolo [31], SIMON [6]) usually apply a round function to 
only one half of the state and as such structures can be implemented in hardware 
with low average power. Also, implementing the inverse of Feistel constructions 
is not very difficult and hence a circuit that provides functionalities for both 
encryption and decryption can be designed with minimal overhead. However, 
given the fact that Feistel structures introduce non-linearity in only one half 
of the state in every round and hence, to maintain security margins, such con- 
structions usually require more executions of the round functions as compared 
to SPN structures. As such Feistel, constructions are not suited for low latency 
implementations. Most SPN constructions, on the other hand, usually apply its 
transformation function to the entire state and so can be implemented using 
fewer rounds. In principle, if n rounds of SPN function and m rounds of Feis- 
tel function (where m > n) have the same security margin and similar energy 
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expenditure, then using the n round SPN function makes more sense since lesser 
energy is consumed to update the state and key register for n rounds. A similar 
argument can be used to resolve the choice between (a) Simple round functions 
with more rounds (e.g. PRESENT [11]) and (b) Complex round functions with 
lesser rounds. 


2.3 Effect of Key Schedule 

Generating separate round keys in each round by means of a key schedule oper- 
ation can eat into the energy budget as it incurs the added cost of updating 
the key register in every round. For example using the STM 90 nm standard cell 
library, in AES (with DSE S-box), the key schedule consumes a total of 25% 
of the total energy consumed. For PRESENT, the key schedule consumes close to 
32 % of the total energy. So designs meant primarily for low energy consumption, 
designers should look to avoid the key schedule operation. This would also be 
efficient in terms of area as it would not be necessary to include a key register 
in the design. 


2.4 Main Conclusion: Low-Energy Design Choices 

We can now state some conclusions that will serve as pointers for a good low 
energy block cipher design. From the point of view of energy, we know that a 
round based architecture is usually optimal. Thus we concentrate on an effi- 
cient round based construction that would with minimal overhead provide both 
the functionalities of encryption and decryption. A cipher like PRINCE, although 
provides both encryption/decryption functionalities with minimal tweak in the 
circuit, does not have an equally energy-efficient round based construction [12], 
as it needs to accommodate 3 different round functions in the same circuit. 
We have also seen that components with low switching and delay tend to per- 
form better energy wise. So another requirement is choosing components with 
low area and delay. In this context, it makes sense to choose 4-bit S-boxes 
over 8-bit S-boxes. We choose SPN architecture over Feistel to minimize the 
number of rounds in the design. And since providing the functionalities of both 
encryption and decryption is an added motivation, we try to include components 
which in addition to having low area/delay, are also involutions. Having such 
components would minimize any additional overhead required for providing the 
functionalities of both encryption and decryption. We will now present the spec- 
ifications for the proposed block cipher and in Sect. 4 we will explain the design 
decisions in the context of the observations made in this Section. 

3 Specification 

Midori is a family of two block ciphers: Midori64 and Midori 128. Both ciphers 
accept 128-bit keys, and have a different block size n (n = 64 for Midori64 and 
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Table 2. Parameters for Midori64 and Midori 128 



block size(n) 

key size 

cell size(m) 

number of rounds 

Midori64 

64 

128 

4 

16 

Midoril28 

128 

128 

8 

20 


n = 128 for Midoril28). The basic parameters of Midori64 and Midori 128 are 
shown in Table 2. 

Midori is a variant of a Substitution Permutation Network (SPN), which 
consists of the S-layer and the P-layer, and uses the following 4x4 array called 
state as a data expression: 


S = 


50 ^4 S 8 512 

51 S5 Sg S13 

52 5 6 5i 0 Si 4 

53 57 Sn S15 


where the sizes of each cell m are 4 and 8 bits for Midori 64 and Midori 128, 
respectively, i.e., s* G {0,l} m , m = 4 for Midori64 and m = 8 for Midoril28. 
A 64-bit or a 128-bit plaintext P is loaded into the state, and the i-th round 
output state is defined as Si, namely So = P • 


3.1 S-Boxes and Matrices 

S-box: Midori utilizes two types of bijective 4-bit S-boxes, Sbo and Sbi, where 
Sb 0 , Sbi : {0, l} 4 — > {0, l} 4 (see Table 3). Sb 0 and Sbi are used in Midori64 and 
Midori 128, respectively. Note that Sbo and Sbi both have the involution property. 

Midoril28 utilizes four different 8-bit S-boxes SSbo, SSbi, SSb 2 and SSb 3 , 
where SSbo, SSbi, SSb 2 , SSb 3 : {0, l} 8 — > {0, l} 8 Mathematically, each SSbi 
consists of input and output bit permutations and two Sbi’s as shown in Fig. 5. 
Each output bit permutation is taken as the inverse of the corresponding input 
bit permutation to keep the involution property. Let the input bit permutation 
of each SSb^ be referred to as p^. Let denote the i-th bit of x , where X[ 0 ] is 
the most significant bit (MSB). Then denoting Pi(x) = we have 

(o) _ (1) _ 

2/[ 0,1, 2, 3, 4, 5, 6, 7] — x [4, 1,6, 3, 0,5, 2, 7] 5 ^[0,1, 2, 3, 4, 5, 6, 7] ^[1,6,7,0,5,2,3,4] 

(2) _ (3) _ 

y[ 0,1, 2, 3, 4, 5, 6, 7] — x [2, 3, 4, 1,6, 7, 0,5] 5 ^[0,1, 2, 3, 4, 5, 6, 7] — x [7, 4, 1,2, 3, 0,5, 6] 


Table 3. 4-bit bijective S-boxes Sbo and Sbi in hexadecimal form 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

a 

b 

c 

d 

e 

f 

Sbo [x] 

c 

a 

d 

3 

e 

b 

f 

7 

8 

9 

1 

5 

0 

2 

4 

6 

Sbi [x] 

1 

0 

5 

3 

e 

2 

f 

7 

d 

a 

9 

b 

c 

8 

4 

6 
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The output permutation used in each SSb^ is simply the inverse of the map p^. 
Matrix: Midori utilizes an involutive binary matrix M defined as follows: 

/0 1 1 1 \ 

M = 1011 . 

110 1 

V 1110/ 

The matrix M updates four m-bit values (xq,xi,X 2 ,xs) as follows: 

t (x 0 ,Xi,X2,X 3 ) <- M • t (x 0 ,Xi,X2,X 3 ), 

where the operations between a matrix and a vector are performed over GF(2 m ). 


SSb 0 


SSbi 


SSb 2 


SSb 3 



Fig. 5. SSbo, SSbi, SSb 2 and SSb 3 


3.2 Round Function 

The round function of Midori consists of an S-layer SubCell: {0, l} n — ► {0,l} n , 
a P-layer ShuffleCell and MixColumn: {0, l} n — > {0, l} n and a key-addition layer 
KeyAdd: {0, l} n x {0, l} n — > {0, l} n . Each layer updates an n-bit state S as 
follows. 

SubCell (S): Sbo and SSb^ are applied to every 4 and 8-bit cell of the state S 
of Midori64 and Midoril28 in parallel, respectively. Namely, 8* <— Sbo^] for 
Midori64 and Si <— SSb ^ mod 4 )[s^] for Midoril28, where 0 < i < 15. 
ShuffleCell (. S ): Each cell of the state is permuted as follows: 

(so, Si, ..., <§i5 ) <— (s 0 , 8io, 55, <§i 5 , Si4, S4, Sp, 81, Sg, 83, S 12 , Sq, 87, 813, S 2 , S 8 ). 

MixColumn (S): M is applied to every 4ra-bit column of the state S, i.e., 
t (si,s i +i,Si+2,Si+3) <- M t (s i ,s i+ i,s i+ 2 ,s i+3 ) and i = 0,4,8,12. 

KeyAdd (S, RKi): The i-th n-bit round key RK{ is XORed to a state S. 


3.3 Data Processing Part 

The data processing part of Midori for encryption Midori Core^) performs as 
follows: 


MidoriCore(^) : 


f {0,l} 16m x {0,l} 16m x {{0, i} 16 ^}^- 1 
\ (X, WK, RK 0 , ..., RK r _ 2 ) i-^ Y 


{0, i} 16m 
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Algorithm MidoriCore^^X, WK, RK 0 , ..., RKr- 2 ) : 

S ^ KeyAdd (X,WK) 
for i = 0 to R — 2 do 
SubCell(S) 

S <- ShuffleCell(S) 

S MixColumn(5 f ) 

S <- KeyAdd (S,RKi) 

S <— SubCell(S') 

y <- KeyAdd (5, WK) 

where R = 16 for Midori64 and R = 20 for Midori 128. Similarly, the inverse 
data processing part Midori Core^ operates as follows: 


MidoriCore^ 


f {0,l} 16m x {0,l} 16m x {{0, i} 167 ™}^- 1 
{ (Y, WK , RK r _ 2 , RK 0 ) ^ X 


{0, f} 16m 


Algorithm MidoriCore^T, WK , RK R - 2 , ..., #K 0 ) : 

S' <- KeyAdd (y, VFK) 
for z = (i? — 2) to 0 do 
S^ SubCell(S) 

S MixColumn(S) 

InvShuffleCell(S) 

S^ KeyAdd(S,L- 1 (i?A0) 

S SubCell(S) 

X <- KeyAd d (S, WK) 

where L~ x (inverse of the linear layer) denotes the composition of the oper- 
ations I nvSh uffleCel I o MixColumn, and I nvSh uffleCel I permutes each cell of the 
state as follows. 

(so, Si, Sis) (so, S7, S14, Sg, S 5, S2, S11, S12, S15, Sg, Si, Sq, S10, S13, S4, S3). 

3.4 Round Key Generation 

For Midori64, a 128-bit secret key K is denoted as two 64-bit keys Kq and K\ 
as K — X 0 ||Xi. Then, WK — K 0 0 K\ and RKi = K^ mod 2 ) ® cx-u where 
0 < i < 14. For Midoril28, WK = K and RKi = K ® /%, where 0 < i < 18. The 
constants fa are defined in Table 4. It can be seen that the constants are in the 
form of 4 x 4 binary matrices. They are added bitwise to the LSB of every round 
key byte in Midori 128 and round key nibble in Midori64 respectively. Note that 
Oii — Pi for 0 < i < 14. 


3.5 Midori Ciphers 

Midori block ciphers are composed of two variants: Midori64 and Midori 128 con- 
sisting of Midori Core(i 6 ) with m — 4 and Midori Core^o) with m = 8, respectively. 
Midori Core( 16 ) is depicted in Fig. 6 as an example. 
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Table 4. The Round Constants f3i 


i Pi 

i Pi 

i Pi 

i Pi 

i Pi 

i Pi 

i Pi 

0 0 10 

o 0100 
0 0 11 

1111 

0 110 
10 10 
10 0 0 
10 0 0 

10 0 0 

9 0 10 1 
10 10 
0 0 11 

0 0 0 0 
10 0 0 
110 1 
0 0 11 

0 0 0 1 
0011 

0 0 0 1 
10 0 1 

10 0 0 
10 10 
0 0 10 
1110 

0 0 0 0 

0 0 11 

0 111 

0 0 0 0 

0 111 

0 0 11 

0 10 0 
0 10 0 

10 10 
0 10 0 
8 0 0 0 0 
10 0 1 

0 0 11 
10 0 0 
9 0 0 1 0 
0 0 10 

0 0 10 

10 1001 
10 0 1 

1111 

0 0 11 

0 0 0 1 
11 110 1 

0 0 0 0 

0 0 0 0 
19 10 0 0 
0 0 10 
1110 

1111 
10 10 
10 0 1 
10 0 0 

1110 

14 1100 

0 10 0 

1110 

0 110 

15 1100 

10 0 0 

10 0 1 

0 10 0 
0 1 0 1 

0 0 10 
10 0 0 

0 0 10 
17 0 0 0 1 
1110 
0 110 

0 0 11 
1000 
18 1101 

0 0 0 0 




Kq © K x 


KO 


* 1 


K 0 K ± 


K 0 K 0 © Ky 


I 

WK 



RK o 








m 14 vcic 



Fig. 6. Overview of Midori64 


4 Design Decision 

Here, we explain our design decisions vis-a-vis the observations of Sect. 2. 


4.1 Linear Layer 

Linear layers of the each variant consist of a cell-permutation (Sh uffleCell) and 
four 4x4 matrix operations (MixColumn). Those operations are performed over 
GF( 2 4 ) and GF( 2 8 ) for the 64 and 128-bit variants, respectively. 


MDS Vs Almost MDS. Using the NanGate 45 nm open cell library, Table 5 
compares three types of 4 x 4 matrices, involutive MDS (M a), non-involutive 
MDS (Mb) and involutive almost MDS matrices (Me) from implementation 
aspects. These matrices are considered lightweight in each of the three afore- 
mentioned criteria [26,31]. 


M a 


/I 2 6 4\ 
2 14 6 
6 4 12 
\4 6 2 1/ 


,M b 


/ 2 3 1 1 \ 
12 3 1 
112 3 
\3 1 1 2/ 


, M c = 


( 0 1 1 1 \ 
10 11 
110 1 
Vino/ 


From Table 5, Me is obviously preferable over the others in terms of the gate 
size and the path delay. In fact, circulant-type almost MDS matrices are adopted 
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Table 5. Comparison of three m 1.1 /~i • r c , 

^ Table 6. Comparison of b-boxes 

matrices 



M a 

M b 

M c 

Area [GE] 

108 

104 

48 

Delay [ns] 

0.93 

0.68 

0.37 

Diffusion 

MDS 

MDS 

Almost MDS 

Involution 

yes 

no 

yes 



PRESENT 

PRINCE 

Sb 0 

Sbi 

Area [GE ] 

24.33 

16 

13.3 

15.33 

Delay [ns] 

0.47 

0.36 

0.24 

0.32 

Involution 

No 

No 

Yes 

Yes 


in PRINCE [12], PRIDE [1], FIDES [8] and CLOC [20]. Moreover, Khoo et al. showed 
that, for a 64-bit block size employing the AES-like structure, the combination of 
4x4 almost MDS matrices ( Me ) with ShiftRow and 16 4-bit S-boxes is the most 
efficient in both a round-based and a serialized implementation by proposing a 
new comparison metric FOAM (figure of adversarial merit), which combines the 
inherent security provided by cryptographic structures and components along 
with their implementation properties [22]. 

While Me has efficient implementation properties, its diffusion speed is 
slower and the minimum number of active S-boxes in each round is smaller 
than those of ciphers employing MDS matrices due to its lower branch number. 
It has been known that those properties are directly related to the immunity 
against several attacks including impossible differential, saturation, differential 
and linear attacks. To improve security of the almost MDS with low imple- 
mentation overheads, we adopt optimal cell-permutation layers which are aimed 
at improving diffusion speed and increasing the number of active S-boxes in 
each round. The diffusion speed is measured by the number of rounds taken to 
attain full diffusion, which is the property that all output cells are affected by 
all input cells. Importantly, changing cell-permutation patterns generally does 
not require additional implementation costs in a round-based and an unrolled 
hardware implementation. 


Approach to Find Optimal Cell-Permutation Layers for Almost MDS. 

Since it is computationally hard to exhaustively count the minimum number 
of active S-boxes for all possible permutations (= 16! « 2 44,25 ) by Matsui’s 
search approach [9,25], we take the following two-step approach to reduce the 
search space. In the fist step, we restrict the cell-permutations to row-based cell- 
permutations which permute four cells in each row, e.g. ShiftRow in AES. The 
number of possible row-based cell-permutations is estimated as 2 18,3 (= (4!) 4 ). 
This step is based on the fact that the full diffusion property relies on only row- 
based property of the cell-permutation. As a result of our searches, we find that 
a class of row-based cell-permutations achieves full diffusion in 3 rounds and its 
necessary and sufficient condition is as follows. 

Condition 1 (3-round full diffusion). For a 4 x 4 cell-array, after applying a 
cell-permutation once and twice, each input cell in a column is mapped into a 
cell in the different column. 
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From our search, 576 row-based cell-permutations satisfy Condition 1. Interest- 
ingly, ShiftRow-type permutation is not included in this class, i.e. it requires 4 
rounds for full diffusion. 

In the second step, we add a column-based cell-permutation, which permutes 
four cells in each column, after applying the class of permutations satisfying 
Condition 1. The target cell permutation consists of the combination of the 
row-based and column-based permutations. Note that adding a column-based 
cell-permutation to the row-based permutations satisfying Condition 1 does not 
affect the full diffusion property. The number of all possible cell-permutations 
of this class is estimated as 2 27,51 (= 576 x (4!) 4 ). Consequently, we find a 
class of cell-permutation achieving the largest number of active S-boxes in each 
round and the smallest number of rounds to attain full diffusion when satisfying 
Condition 1 and the following Condition 2 or 3. 

Condition 2 (The number of active S-box). For a 4 x 4 cell-array, after apply- 
ing a cell-permutation twice and twice inversely, each input cell in a column is 
mapped into a cell in the same row. 

Condition 3 (The number of active S-box). For a 4 x 4 cell-array, after apply- 
ing a cell-permutation once and three times inversely, each input cell in a column 
is mapped into a cell in the same row. 

The numbers of cell-permutations satisfying Conditions 2 and 3 are both 576. We 
define such 1152 cell-permutation as optimal cell-permutations. Table 7 shows 
the minimum numbers of differentially /linearly active S-boxes of the optimal cell- 
permutations and the ShiftRow-type permutation. Our optimal cell-permutations 
drastically improve the minimum number of differentially /linearly active S-boxes 
in each round while keeping the 3-round full diffusion property. Thus, our optimal 
permutations achieve security against several attacks such as differential/linear 
and impossible attacks in the same number of rounds compared to ShiftRow-type 
permutation. Midori 128 and Midori64 adopt one of optimal cell permutations 
satisfying both Conditions 1 and 2 as follows. 


(so, s ly •••5 ^15) ( 5 Cb 5 lCb 5 5, 5 15> 5 14> ^4, <§11, Si, Sg, S 3, S12, <$6, s 7 5 s 13 i s 2 i Sg). 


Starting from the state So, each cell of So is mapped to Si, S 2 , Sf 1 and Sf) 1 
after applying the above cell-permutation once, twice, once inversely and twice 
inversely, respectively, as follows. 


So S4 Sg S 12 


So S14 Sg S 7 


So S2 S3 Si 

Si S5 Sg S13 

,Si = 

SlO S4 S3 S13 

,s 2 = 

S12 S14 S15 S13 

S 2 S6 S10 S14 

S5 Sn S12 S 2 

S4 S6 S7 S5 

_ S3 S7 Sn Si5_ 


_Si 5 Si S6 Sg _ 


_ Sg S10 Sn Sg _ 


S 


-1 

1 


SO s 5 S15 S10 
S7 S 2 Sg Sis 
Sl4 Sn Si S4 
S9 S12 Sq S3 


i-l 


So S 2 S3 Si 
Sl 2 S14 S15 S13 
S4 S6 S 7 S5 
S 8 S10 Sn Sg 
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Table 7. The number of minimum number of differentially/linearly active S-boxes 
(AS) of Midori64 and Midoril28 


Round number 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

Min. # of AS (Optimal Cell-Permutation) 

16 

23 

30 

35 

38 

41 

50 

57 

62 

67 

72 

75 

84 

Min. # of AS (ShiftRow-type Permutation) 

16 

18 

20 

26 

32 

34 

36 

42 

48 

50 

52 

58 

64 


From those mappings, it is clear that the relation among S f 1 , So and S 2 satisfies 
Condition 2. Similarly, all of the pairs Sf 1 ), (S^ 1 , So)? (So? Si), (Si,S 2 ) 

satisfy Condition 1. 


4.2 S-Box Layer 

According to analysis of Sect. 2.1, 4-bit S-boxes are usually more efficient than 
8-bit S-boxes in terms of energy consumption per cycle. Also, the small path 
delay and the small gate area lead to low-energy implementation. To optimize 
S-layer regarding energy consumption, we aim to develop a small-delay and 
lightweight 4-bit S-box which fulfill the following requirements: (1) the maximal 
probability of a differential is 2 -2 , (2) the maximal absolute bias of a linear 
approximation is 2 -2 and (3) involution. The requirement (3) enables us to 
reduce the number of possible S-boxes from 2 44,25 to 2 25 5 . 


Approach to Find Small-Delay and Lightweight 4-Bit S-Box. Our app- 
roach starts with a key observation that the path delay is highly related to the 
dependency of the computation. We introduce a metric depth to estimate the 
path delay of S-boxes. 

Definition 1 (depth): The depth is defined as the sum of sequential path delays 
of basic operations AND, OR, NAND, NOR and NOT. 

Example. The depth of the computation of (x ® y) • z is estimated as the sum of 
path delays of XOR and AND, because u -z” operation is feasible only after the 
computation of (x ® y), 

In our search, we assume that depths of XOR, AND/OR, NAND/NOR and 
NOT are weighted as 2, 1.5, 1 and 0.5, respectively, based on the number of 
the transistors to be sequentially proceeded in the operation. The required gates 
of NOT, NAND/NOR, AND/OR and XOR/XNOR are estimated as 0.5, 1, 1.5 
and 2 [GEs], respectively. We search all S-boxes whose depth is 1, 1.5, 2, ... , and 
check whether the S-boxes satisfy our security requirements. As a result, we can 
find Sbo (see Table 3) whose depth and gate size are the lowest and the smallest 
ones in our search. Sbo can be expressed as follows, where inputs and outputs 
are defined as {a, b , c, d} and {a', b' , c' ,d'}, and a and a' are the most significant 
bits. 
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Table 8. Input-output bit relations of each S-box 



SSb 0 

SSbi 

SSb 2 

SSb 3 

A 

(1, 3, 4, 6) 

(0, 1, 6, 7) 

(1, 2, 3, 4) 

(1, 2, 4, 7) 

B 

(0, 2, 5, 7) 

(2, 3, 4, 5) 

(0, 5, 6, 7) 

(0, 3, 5, 6) 


a' = | c NAND (a NAND b)j NAND (a OR d) 

b' = fa NOR d) NOR ( b AND c)j NAND ((a AND c) NAND dj 

d = ( b NAND d) NAND fb NOR d) OR a) 

d! = (a NOR ( b OR c)j NOR ((a NAND b) NAND (c OR d)j 

For instance, let us consider the computation of d. In this computation, 
(< b NAND d) and ( b NOR d) can be done at first. After that, the computation 
of ( b NOR d) OR a is done. Then, the last operation of NAND is executable. Thus, 
the depth of d is estimated as 3.5 ( = 1 + 1.5 + 1). The depths of the remaining 
a', b' and d! are also estimated as 3.0 or 3.5. 

Considering additional requirement full diffusion property , we find Sbi which 
has the lowest depth and the smallest gate area among 4-bit bijective S-boxes 
satisfying the requirements (1), (2), (3) and the full diffusion property. Sbi is 
expressed as follows : 

a' = fb NAND c) NAND a) NAND ((a NOR d) NAND bj 
b' = ((a XOR c) NOR bj NOR fb NAND c) AND dj 
d = (c NAND d) NAND ((a XOR b ) NAND (b OR d)j 
d! = fa NAND b) NAND cj NAND ( b OR d) 

Note that an S-box satisfies the full diffusion property if and only if any inputs 
{a, 6, c, d} of the S-box non-linearly affect all outputs {a', 6', c', d'}. This full 
diffusion property enables us to ensure a 3-round property regarding the diffusion 
in Midori 128 (we will explain it in the end of this section). 


Evaluation. Table 6 shows the comparison of S-boxes of PRESENT, PRINCE, Sbo 
and Sbi using NanGate 45 nm open cell library. The path delay of Sbo is 1.5 
times and twice smaller than PRINCE and PRESENT, respectively, and the gate 
size is also smaller than the others. Those of Sbi are comparable to PRINCE’s 
S-box. Additionally Sbo and Sbi have the involution property. 
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8-Bit S-Boxes Based on 4-Bit S-Boxes. From the observation in Sect. 2 . 1 , 
we adopt 8 -bit S-boxes consisting of two 4-bit S-boxes processed in parallel to 
minimize the path delay in the round-based implementation. Moreover, in order 
to avoid having the unfavorable independent property exploited in the full-round 
attack on KLEIN [23], we add properly-chosen bit-permutations to the begin 
and the end of 8 -bit S-boxes as shown in Fig. 5. As described in Sect. 3.1, each 
output bit-permutation is the inverse of the corresponding input bit-permutation 
to keep the involution property. With a property of our P-layer and those bit- 
permutations, we claim that no independent property is found after 3 rounds in 
Midori 128. Since Sbi has the full diffusion property, any input bit of SSbi affects 
the corresponding 4 bits output as shown in Table 8 . For example, in SSbi, any 
of the i-th input bit affects all of the i-th output bits, where i G {0, 1, 6, 7}. We 
choose bit-permutations for SSbo, SSbi, SSb 2 and SSb 3 so that those satisfy the 
following property. 

Property 1 . Affected 4-bit positions of outputs of an S-box are included in both 
of two different input groups of the other three S-boxes. 

For example, the group A of SSbi is {0, 1, 6, 7}. Then, those bit positions are 
found in the groups A and B of SSbo- This implies that the { 0 , 1 , 6 , 7}-th input 
bits of SSbo affect all 8 bits output. For the matrix operation t (yo, 2 /i, U 2 , IJ 3 ) <— 
M t (xo, xi, X 2 , £ 3 ), we have the following property. 

Property 2. Each input cell affects three cells in the different cell positions 
from the input. 

For instance, xo deterministically affects 2 / 1 , y<i and 7 / 3 , and does not affect yo . 
From Properties 1 and 2 , we obtain the following theorem. 

Theorem 1 . In Midori 128, any input bit nonlinearly affect all 128 bits of the 
state after 3 rounds. 

Proof. An input bit affects 4 bits in the corresponding cell after the first S-layer 
due to the full diffusion property of Sbi. From Property 2 , the affected 4 bits 
in the cell are diffused to three cells in the same column but the different cell 
position after MixColumn. Note that, in the affected three cells, the affected bit 
positions are the same. From Property 1, in each affected three cells, the affected 
4 bits are spreads over all 8 bits in the cell after the 2nd S-layer. Therefore, all 
bits are affected by any input after 3 rounds (see Fig. 7). □ 


4.3 Key Scheduling Function 

To save energy, Midori 128 does not employ any key scheduling function. The 
same 128 bit key is used as the whitening key and to generate the round key. To 
make an efficient circuit for decryption, the i-th round key is defined as L - 1 (K)® 
1 (/5is— z) , where L -1 denotes the inverse of the linear layer. Computation of 
L~ X {K) involves a one-time computation with the key at the beginning at the 
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Fig. 7. Theorem 1 : 3-round full diffusion property 


decryption function and so does not consume any significant energy. The round 
key generation of Midori64, is slightly more complicated, as it involves selecting 
Kq and K i, i.e. the most significant and least significant halves of the 128 bit 
key in alternate rounds. This can be achieved by the use of a single multiplexer. 
For efficient decryption, a one-time computation of L~ 1 (Ko) and L~ X [K i) can 
be done at the beginning of the algorithm, which again does not consume any 
significant energy. 


4.4 Round Constant 

Both Midori 128 and Midori64 use 4x4 binary matrices as round constants. The 
constants have been derived from the hexadecimal encoding of the fractional 
part of 7T = 3.243f 6a88 85a3 • • • . For example, the 1st, 2nd, 3rd, 4th rows 
of (3q when read as a 4-bit binary constant, are the encoding of the hex values 
2,4,3,f respectively. Similarly for the other fas. These are added bitwise to 
the LSB of each round key byte in Midori 128 and round key nibble in Midori64. 
The round constants were chosen in this manner with a view to have an energy- 
efficient decryption circuit. Both fa and L~ 1 (fa) are 4x4 binary matrices, and 
so in both Midori 128 and Midori64, the round constant addition requires a total 
of 16 XOR gates only. The constants fa and L -1 (/^) can be stored in lookup 
tables and filtered accordingly in each round. 

5 Security Evaluation 

5.1 Differential/Linear Cryptanalysis 

The minimum number of differentially and linearly active S-boxes of each round 
is estimated as shown in Table 7. The maximum differential and linear proba- 
bilities of Sbo, SSbo, SSbi, SSb2 and SSb3 are 2 -2 , respectively. Midori64 and 
Midori 128 have more than 32 and 64 active S-boxes after 7 and 13 rounds. 
Thus, we expect that variants of Midori64 and Midori 128 reduced to 7 rounds 
and 13 rounds do not have any differential and linear trails whose probabilities 
are higher than 2 -64 and 2 -128 . 
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5.2 Boomerang- Type Attack 

The boomerang- type attacks first divide the cipher into two sub-ciphers, then 
find a boomerang quartet with high probability. The probability of construct- 
ing a boomerang quartet is denoted as p 2 q 2 , where p = Pr 2 [o — » /?], and 

a and /3 are input and output differences for the first sub-cipher, and q for 
the second sub-cipher, p 2 is bounded by the maximum differential trail prob- 
ability, i.e., p 2 < max^Pr^ — > /?], and q 2 as well. Let p,q be the maximum 
differential trail probability for the first and the second sub-ciphers. Then, p, q 
are bounded by multiplying the minimum number of active S-boxes in each 
sub-cipher. From Table 7, any combination of two sub-ciphers for consisting of 
Midori64 and Midori 128 after 8 and 14 rounds has at least 32 and 64 active 
S-boxes in total. Note that these bounds of boomerang attacks are very con- 
servative ones, i.e., it requires unrealistic assumptions of p 2 = p and q 2 = q. 
Actually, in our active S-box search, we did not find such special events. Thus, 
we expect that much smaller rounds than 8 and 14 rounds are secure against 
boomerang-type attacks. 


5.3 Impossible Differential Attacks 

Midori64 and Midori 128 achieve the 3-round full diffusion property. Thus, dif- 
ferences of all cells in a state becomes unknown after SubCell of 4 rounds, i.e., 
there is no any probability-one (truncated) differential characteristic. Following 
the miss-in-the-middle approach, the maximum number of rounds of impossible 
differential characteristics is estimated as 7 rounds. 

In order to obtain the lower bound of rounds of impossible differential, we 
try to find actual impossible differential characteristics. We utilize several deter- 
ministic properties of four binary matrices M . This approach was also adopted 
in the security evaluation of FIDES [8]. As a result, we find 6-round impossible 
differentials such that if only one active cell is input, 6-rounds of Midori64 and 
Midori 128 never produces only one active cell We believe that full rounds of 
Midori64 and Midori 128 have sufficient number of rounds as the security margin. 


5.4 Meet-in-the-Middle Attacks 

The 3-round full diffusion property with our S-boxes enable us to claim that any 
inserted key bit of {Kq, or K non-linearly affects all bits of the state after 
3 rounds in the forward and the backward directions in Midori64 and Midori 128, 
respectively. Thus, the number of rounds used for the partial matching (PM) [2] is 
upper bounded by 5 (= (3 — 1) + (3 — 1) + 1). The condition for the initial structure 
(IS) [29], also called independent biclique [10], is that key differential trails in 
the forward direction and those in the backward direction do not share active 
non-linear components. For Midori64 and Midori 128, since any key differential 
affects all 16 S-boxes after at least 4 rounds in the forward and the backward 
directions, there is no such differential which shares active S-box in more than 4 
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rounds. Thus, the number of rounds used for IS is upper bounded by 3. Assuming 
that the splice-and-cut technique allows an attacker to add more 3 rounds in 
the worst case, at most 11-round (3 + 3 + 5) MitM attack may be feasible. 
However, because of white keys in the begin and the end and the actual constraint 
of key orders, we consider that it is difficult to construct 11-round attacks on 
Midori64 and Midoril28. 

5.5 Other Attacks 

We also consider other-types of attacks including a integral differential, a trun- 
cated differential, a slide, a reflection, and an algebraic attack. Consequently, we 
expect that none of them work better than brute force attacks. 

6 Implementation 

The main design objectives of Midori were first to achieve efficiency in energy 
consumption and second to provide both the encryption and decryption func- 
tionalities with minimal overhead. In this context, it is essential to have a round 
based design optimal in terms of energy consumption, since unrolled designs are 
unlikely to be efficient in terms of energy consumption. The S-box and the Mix- 
Column layer were specifically chosen for their energy-efficiency and their invo- 
lutive property. Both these layers have very small logic depth which makes the 
energy consumption per round figure as small as possible. Structurally Midori- 
Core and Midori Core -1 differ only in the order of application of ShuffleCell, Mix- 
Column and InvShuffleCell operations. And so, the circuit for the round based 
implementation of the cipher, that accommodates both encryption and decryp- 
tion can be realized in Fig. 8. 

Since the ShuffleCell operation (Sh) and MixColumn (MC) do not commute, 
the linear layer which is basically the composition of MCoSh (= L say), must 
be inverted during the decryption by L -1 = Sh -1 oMC. In hardware, this can 
be achieved in two ways. The first involves filtering the outputs of the L and 
L -1 operations through a single multiplexer. This requires two instances of the 
MixColumn logic in the circuit, and since this layer is the most expensive in 



Fig. 8. The round based encryption/decryption architecture 
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Table 9. A comparison of energy consumption of Midori with selected ciphers for the 
STM 90 nm Logic Process. (Average Power reported at 10 MHz) 


# 

Cipher 

Block Size 

Architecture 

Area 

(in GE) 

Energy 

pj 

Energy /bit 

pj 

Average Power 
(pW) 

Critical Path 

(ns) 

1 

AES 

128 

ED 

21274 

769.0 

6.01 

699.1 

4.08 




E 

12459 

350.7 

2.74 

318.8 

3.32 

2 

N0EKE0N 

128 

ED 

3439 

331.5 

2.59 

184.2 

3.79 




E 

2284 

338.0 

2.64 

187.8 

3.38 

3 

SIMON 128/128 

128 

ED 

3480 

855.6 

6.68 

124.0 

2.67 




E 

2420 

664.1 

5.19 

96.2 

2.66 

4 

Midoril28 

128 

ED 

3661 

228.3 

1.78 

108.7 

2.44 




E 

2522 

187.3 

1.46 

89.2 

2.25 

5 

PRESENT 

64 

ED 

2186 

250.2 

3.91 

75.8 

2.32 




E 

1440 

172.3 

2.69 

52.2 

2.09 

6 

PRINCE 

64 

ED 

2650 

146.3 

2.29 

112.5 

4.09 




E 

2286 

144.7 

2.26 

111.3 

4.06 

7 

Midori64 

64 

ED 

2450 

121.0 

1.89 

71.2 

2.12 




E 

1542 

103.0 

1.61 

60.6 

2.06 


terms of area and energy consumed, it is not the most efficient way to achieve 
this functionality. The second method which is better in terms of both area 
and energy is the one shown in Fig. 8. This involves using two multiplexers for 
filtering the outputs of the Sh and Sh _1 operations and a single instance of the 
MixColumn logic. To perform the decryption operation using this circuit, the 
round key needs to be changed to L -1 (if), and correspondingly the i th round 
constant to L~ 1 (/3is-i )• The first involves a cheap one-time change to the master 
key, while keeping the whitening key constant. The round constant functionality 
can be achieved by employing two lookup tables, one each for encryption and 
decryption and filtering the appropriate round constant through a multiplexer. 
The round constants have been chosen in a manner so that both and L -1 (/^) 
are 4x4 binary matrices, and so this layer requires a total of 16 XOR gates only. 
The circuit for the 64-bit variant is the same as in Fig. 8, except that it requires 
an extra filtering between Kq and K\ (the most and least significant halves of 
the secret key) in alternate rounds. 


6.1 Evaluation 

All the designs were initially implemented in VHDL and the functional veri- 
fication was done using Mentor Graphics ModelSim SE software. The designs 
were then synthesized using the Synopsys Design Compiler for the Standard Cell 
library of the STM 90 nm Logic Process: CORE90GPHVT v 2.1. a. 

The switching activity file was then generated by performing a timing simu- 
lation on the synthesized netlist using the Synopsys VCS Software. The energy 
was then estimated with the Synopsys Power Compiler by using the switching 
activity file. An operating frequency of 10 MHz was used in all the simulations 
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since the effect of the leakage power is minimal at this frequency, and so the 
energy consumed is more or less independent of the clock frequency. The results 
of the simulation for the 90 nm logic process are presented in Table 9 along 
with similar evaluations for AES, N0EKE0N, SIMON 128/128, PRESENT, PRINCE. 
It can be seen that Midori 128/ Midori64 performs better than N0EKE0N/PRINCE 
which were also designed to make the combined functionalities of encryption and 
decryption easily available. In Fig. 9 we compare the energy /bit consumption of 
the ED architectures all the seven ciphers along with the cumulative latency 
figure (calculated as critical path x number of rounds). It can be seen that 
Midori 128 and Midori64 fare optimally with respect to both parameters. 



1 

' 
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128/128 

• 

PRESENT 
N0EKE0N ♦ 

■ 


- 

Midoril28 

- . * ."prince 

Midori64 

■ 

AES 

• 



1 2 3 4 5 6 7" 


Energy/bit (pJ) 


Fig. 9. Cumulative latency vs Energy /bit figures 


7 Conclusion 

In this paper we present the block ciphers Midori 128 and Midori64, optimized 
with respect to energy consumption. We first identify design choices that make 
a given algorithm efficient in terms of energy. Thereafter we propose two design 
components i.e. MixColumn matrix and S-box, that help us achieve the objectives 
of low energy design. These components are additionally involutive, that makes it 
easier to design a circuit with functionalities for both encryption and decryption. 
The energy of the proposed design was then found to be optimal in comparison 
with state of the art block ciphers available in literature. 

Appendix A: Test Vectors 

A. Midoril28 

Plaintext : 00000000000000000000000000000000 

1. Key : 00000000000000000000000000000000 

Ciphertext : c055cbb95996d!4902b60574d5e728d6 
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Plaintext : 51084ce6e73a5ca2ec87d7babc297543 

2. Key : 687ded3b3c85b3f 35bl009863e2a8cbf 

Ciphertext : Ie0ac4fddff71b4cl801b73ee4af c83d 


B. Midori64 


1 . 


Plaintext 

Key 

Ciphertext 


0000000000000000 

00000000000000000000000000000000 

3c9cceda2bbd449a 


Plaintext 
2. Key 

Ciphertext 


42c20fd3b586879e 

687ded3b3c85b3f 35bl009863e2a8cbf 
66bcdc6270d901cd 
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Abstract. Recent advances in block-cipher theory deliver security 
analyses in models where one or more underlying components (e.g., a 
function or a permutation) are ideal (i.e., randomly chosen). This paper 
addresses the question of finding new constructions achieving the highest 
possible security level under minimal assumptions in such ideal models. 

We present a new block-cipher construction, derived from the Swap- 
or-Not construction by Hoang et al. (CRYPTO ’12). With n-bit block 
length, our construction is a secure pseudorandom permutation (PRP) 
against attackers making 2 n_ °6°s n ) block-cipher queries, and 2 n ~° ^ 
queries to the underlying component (which has itself domain size 
roughly n). This security level is nearly optimal. So far, only key- 
alternating ciphers have been known to achieve comparable security 
using 0(n) independent random permutations. In contrast, we only use 
a single function or permutation , and still achieve similar efficiency. 

Our second contribution is a generic method to enhance a block 
cipher, initially only secure as a PRP, to additionally withstand related- 
key attacks without substantial loss in terms of concrete security. 


Keywords: Block-cipher theory • Related-key security 


1 Introduction 

Several recent works provide ideal-model security proofs for key- alternating 
ciphers [2,14-17,19,23,25,26,31,50] and for Feistel-like ciphers [20,29,34,38,42]. 
In these proofs, the underlying components (wich are either permutations or 
functions) are chosen uniformly at random, and are public , i.e., the attacker can 
evaluate them. At the very least, these proofs target pseudorandom permutation 
(PRP) security: The block cipher, under a secret key, must be indistinguishable 
from a random permutation, provided the attacker makes at most q queries to 
the cipher, and at most qp queries to the underlying component, for q and qp 
as large as possible. 
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Ideal-model proofs imply that the block cipher is secure against generic 
attacks (i.e. , treating every component as a black box). Heuristically, however, 
one hopes for even more: Namely, that under a careful implementation of the 
underlying component, the construction retains the promised security level. 

Contributions. This paper contributes along two different axes: 

- Weaker Assumptions. We present a new block-cipher design achieving 
near-optimal security, i.e., it remains secure even when q and qp approach 
the sizes of the block-cipher and component domains, respectively. Our con- 
struction can be instantiated from a function or, alternatively, from a single 
permutation. This is the first construction from a function with such secu- 
rity level, and previous permutation-based constructions all relied on multiple 
permutations to achieve such high security. 

- Related-key Security. We show how to enhance our construction to achieve 
related-key security without significantly impacting its efficiency and security. 
This is achieved via a generic transformation of independent interest. 

This work should not be seen primarily as suggesting a new practical block-cipher 
construction, but rather as understanding the highest achievable security level in 
the model block ciphers are typically analyzed. The resulting technical questions 
are fairly involved, and resolving them is where we see our contributions. 

Still, we hope that our approach may inspire designers. Our instantiation 
from a permutation gives a possible path for a first proof-of-concept implemen- 
tation, where one simply takes a single-round of AES as the underlying permu- 
tation. (And in fact, even a simpler object may be sufficient). 


1.1 First Contribution: Full-Domain Security 

We start by explaining our construction from a (random) function. Concretely, 
we consider block-cipher constructions BC with block length n and key length 
n using an underlying keyless function F with ra-bit inputs. We say that BC is 
(q^qp) -secure (as a PRP) if no attacker can distinguish with substantial advan- 
tage the real world - where it can query qp times a randomly sampled function 
F and overall q times the block cipher BC^ (using the function F and a random 
secret key K ) - from an ideal world where BC^ is replaced by an independent 
random permutation of the n-bit strings. (In fact, we typically also allow inverse 
queries to the block cipher and the permutation). 

Our GOAL. Let us first look at what can we expect for q and qp when a cipher 
is (q,qp) -secure. Clearly, qp < 2 m and q < 2 n , assuming queries are distinct. 
However, one can also prove that (roughly) qp < 2 K is necessary, otherwise, the 
adversary can mount a brute-force key search attack. Moreover, q < 2 m must 
also hold (cf. e.g. [28] for a precise statement of these bounds). 

Here, we target (near) optimal security, i.e., we would like to achieve security 
for q and qp as close as possible to 2 n and 2 m , respectively, whenever m > n. 
That is, the construction should remain secure even if the adversary can query 
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most of its domain, and of that of the underlying function F. We note that the 
question is meaningful for every value of m > n, but we specifically target the 
case where m « n, e.g., m = n, or m = n + O(logn). 

Previous constructions from functions fall short of achieving this: Gentry 
and Ramzan [29], and the recent generalization of their work by Lampe and 
Seurin [38], use a Feistel-based approach with m = n/ 2, and this hence yields 
(at best) (2 n / 1 2 * * , 2 n / 2 )-security. (The work of [38] approaches that security level 
for increasing number of rounds). In contrast, key-alternating ciphers (KACs) 
have been studied in several works [2,14-16,19,23,26,31,50], and the tightest 
bounds show them to be (2 n ( 1-e ), 2 n ( 1_e ))-secure, when using 0( 1/e) rounds 
calling each an (independent) n-bit random permutation. However, there is no 
way of making direct use of KACs given only a non-invertible function. 

The WSN construction. Our construction - which we call Whitened Swap- 
or-Not (WSN) - adds simple whitening steps to the Swap-or-Not construction by 
Hoang, Morris, and Rogaway [33], which was designed for the (different) setting 
where the component functions are secret-key primitives. Concretely, the WSN 
construction, on input X = Xo, iterates R times a very simple round structure 
of the form 


Xi+i < — X{ ® (.F&(i ) {Wi ® max{Ay , Ay ® Ay}) • Ay) , 

where Wi and Ay are round keys, max of two strings returns the largest with 
respect to lexicographic ordering, and F b ^(x) returns the first bit of F{pc) in 
the first half of the rounds, and the second bit in the second half. (Moreover, 
• denotes simple scalar multiplication with a bit, i.e., b • X = X if b = 1, and 
0 n else). In particular, our construction requires F to only output 2 bits. The 
round structure is very weak 1 , and it differs from the construction of [33] in that 
the same round function is invoked over multiple rounds, and as this function is 
public, we use a key Wi to whiten the input. We prove the following: 

Main Theorem. (Informal) The WSN construction for R = 0{n ) 

rounds is (2 n- °( lo s n ), 2 n- °( 1 ))-secure. 

Note that 0(n) rounds are clearly asymptotically optimal. 2 For some parameter 
cases, techniques from [47,49] can in fact be used to obtain a (2 n , 2 n ( 1-£ ))-secure 
PRP, at the cost of a higher number of rounds. 

Functions vs. permutations. It is beyond the scope of this paper to assess 
whether a function is a better starting point than a permutation in practice. 
Independently of this, we believe that studying constructions from functions is 
a fundamental theoretical problem for at least two reasons. 


1 A single round can easily be distinguished from a random permutation with a con- 
stant number of queries, as every input x is mapped to either x or x ® Ay. 

2 Even for one single query, every internal call to F can supply at most one bit of 

randomness, and the output must be (information theoretically) indistinguishable 

from a random n-bit string, and thus 12 (n) calls are necessary. 
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Foremost, functions are combinatorially simpler than permutations, and thus, 
providing constructions from them (and thus enabling a secure permutation 
structure) is an important theoretical question, akin to (and harder than) the 
problem of building PRPs from PRFs covered by a multitude of papers. Also, 
practical designs from keyless round functions have been considered (cf. e.g. [1]). 

In addition, our construction only requires c = 2 output bits, and it is worth 
investigating whether such short-output functions are also harder to devise than 
permutations. We in fact provide some theoretical evidence that this may not 
be the case. We prove that an elegant construction by Hall, Wagner, Kelsey, and 
Schneier [32] can be used to transform any permutation from n + c bits to n + c 
bits into a function from n bits to c bits which is perfectly indifferentiable [44] 
from a random function. This property ensures that the concrete security of 
every cipher using a function F : {0,l} n — > {0,1} C is preserved if we replace F 
with the construction from i r, and allow the adversary access to it and its inverse 
7T _1 . The construction makes 2 C permutation calls, and thus makes only sense for 
small c. In contrast, it should be noted that the only indifferentiable construction 
of a permutation from functions is complex and weakly secure [34] , and that no 
suitable constant-complexity high-security constructions of large-range functions 
from permutations exist, the most secure construction being [41,46]. 

A SINGLE-PERMUTATION instantiation. With c = 2, combining the WSN con- 
struction with the HWKS construction yields a secure cipher with n-bit block 
length from a single permutation on (n + 2)-bit strings. In contrast, we are not 
aware of any trick to instantiate KACs from a single permutation retaining prov- 
able nearly-optimal security, even by enlarging the domain of the permutation. 
The only exception is the work of [15], which however only considers two rounds 
and hence falls short of achieving full-domain security. 

The complexity of the resulting construction matches (asymptotically) that of 
KACs when targeting (2 n-0 ( lo s n ) 5 2 n- °( 1 ))-security. Nonetheless, a clear advan- 
tage of KACs is that their security degrades smoothly when reducing the amount 
of rounds, whereas here 0(n) rounds remain necessary even for (1, 0) -security. 
We note that in the setting of functions constructions with such smooth security 
degradations are not known, even in the simpler setting of [33]. 

Reducing the key length. Arguably, an obvious drawback of our construc- 
tion is that the key length grows with the number of rounds. We note that this 
is also true for key-alternating ciphers, and it is not unique to our construction. 

It is worth noting that the key length can be reduced via standard techniques 
without affecting security, by deriving the round keys from a single (n — d)-bit 
master key K as <- H(K || (i- 1)) and W* <- H(K || (R+i-l)) for all i G [R] 
and a function H : {0, l} n — > {0, l} n (to be modeled as random in the proof), 
where (•) denotes the (d = |"log(2i2) + l])-bit binary encoding of an integer in 
[2 R\. (Note that d = O(logn)). The security proof is fairly straightforward, and 
omitted - it essentially accounts to excluding the event that H is queried on one 
of the values related to the key, and the reducing the analysis to the one with 
large keys. This adds an additional qn • R/ 2 n ~ d term to the bound, where qu is 
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the number of queries to H. H can in fact be built from the very same function 
F, but this requires a slightly more involved analysis. 


1.2 Second Contribution: Related-Key Security 

In the second part, we show how to generically make any block-cipher construc- 
tion secure against related-key attacks (or RKA secure, for short) while preserving 
full-domain security and small input length of the underlying function. 

On RKA security. Several attacks over the last two decades (cf. e.g. [8-13, 
35]) have motivated RKA security as the new golden standard for block-cipher 
security. As formalized by Bellare and Kohno [5] , RKA security is parameterized 
by a class of key transformations <P. Then, pseudorandomness security defined 
above is extended to allow the attacker for block-cipher queries of the form 
(</>, +, X) or (</>, — , Y) for 0 G ^ and 1,7 G {0, l} n , resulting in BC^k) (A) and 

It is easy to see that WSN is not RKA secure if the class allows for XORing 
chosen offsets to individual keys. Querying an input X (with the original key), 
and querying I0d while adding A to K\ results in the same output with 
probability 1/2. In the random permutation model, two recent works [19,26] 
have shown that KACs are RKA secure (for appropriate key scheduling), yet 
the resulting construction is only (2 n / 2 , 2 n / 2 )-secure. Here, in contrast, we target 
full-domain security of the cipher. 

Related-key secure key-derivation. We consider a generic approach to 
shield ciphers from related- key attacks using related-key secure key- derivation 
functions (or RKA-KDF, for short). These are functions KDF : {0, 1} K — > {0, 1 Y 
with the property that under a random secret key A, the outputs of KDF (0(A)), 
for different 0 G look random and independent. A similar concept was pro- 
posed by Lucks [40], and further formalized by Barbosa and Farshim [3]. For any 
secure block cipher BC, the new block cipher computes, for key K and input A, 
the value BCkdf(k)P0j an d is easily proved to be RKA-secure. Note that this 
approach is very different from the one used for standard-model RKA-secure 
PRF and PRP constructions (as in [4]), which leverage algebraic properties of 
PRF constructions. 3 

Building RKA-KDFs in ideal models may appear too easy: A hash function 
H : {0, 1}* — > {0, 1}^, when modeled as a random oracle [6], is a secure RKA- 
KDF. However, such construction can be broken in e l K ' 2 queries by a simple 
collision argument. 4 If our goal is to achieve security almost 2 n to preserve 
security of e.g. WSN above, then we need to set n > 2 n. But what if we are 


3 Also, our requirements are stronger than those for non-malleable codes and non- 
malleable key- derivation [24,27]. 

4 For example, for Q 2 K ^ 2 , and an additive RKA attack asking for random 
A\ , . . . , Aq, one of the values H(K © Ai) is going to collide with constant prob- 
ability with one of the values A (A*), for independent n-bit strings X \ , ...,Aq, 
allowing to distinguish. 
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building our block cipher from a primitive with n-bit inputs, like the very same 
primitive used to build the block cipher, as in the WSN setting above? 

One approach is to use a domain extender in the sense of indifferentiability 
[44] . The only known construction with (near) optimal security is due to Maurer 
and Tessaro [45] (MT), and further abstracted by Dodis and Steinberger [22]. 
Unfortunately, instantiations of the MT construction are very inefficient, and 
make 0(n c ) calls to the underlying function for some undetermined (and fairly 
large) c. 

MT-based RKA-KDFs. As our second contribution, we provide a highly paral- 
lelizable construction of a RKA-KDF from a keyless function with nearly optimal 
security, i.e., its outputs are pseudorandom even when evaluated on q = 2 n ( 1_e ) 
related keys, and the underlying function can be evaluated qp = 2 n ( 1_e ) times, 
where e > 0. Our construction is a variant of the MT construction. However, 
while the latter is inefficient as it relies on a complex combinatorial object, called 
an input-restricting function family, here, we show that to achieve RKA-KDF 
security it is sufficient to use a much simpler hitter [30], which can for instance 
be built from suitable const ant- degree expander graphs. 

Overall, our construction needs 0(n) calls to independent n-to-n-bit func- 
tions. (It can also be reformulated to call a single n-to-n-bit function). We see 
it as a challenging open problem to improve the complexity, but we note that 
this already yields the most efficient known approach to ensure high related-key 
security for block ciphers built from ideal primitives. 

Indifferentiability. The question of building a block cipher from a random 
function which is as secure as an ideal cipher (with respect to indifferentiability) 
was studied and solved by [20,34]. In the same vein, indifferentiable KAC-like 
cipher constructions from permutations have been given [2,31,37]. While these 
constructions are related-key secure, their concrete security is fairly weak. 

2 Preliminaries 

2.1 Notation 

Throughout this paper, we let [n\ := {1, . . . , n}. Further, we denote by Fcs(m, n) 
the set of functions mapping ra-bit strings to n-bit strings, and by Fcs(*,n) the 
set of functions {0,1}* — > {0, l} n . Similarly, we let Perms(n) C Fcs(n, n) be 
the set of permutations on {0, l} n . Given a string X G {0, l} m , we denote by 
X[i . . . j] (for i < j ) the sub-string consisting of bits i, i + 1, . . . , j — l,j of X. We 
also write X- % instead of X[1 . . . i\. Further, given another string X' G {0, l} n , 
we denote by X || X' the (m + n)-bit concatenation of X and X ' . 

Algorithms, constructions, and adversaries in this paper are with respect to 
some (not further specified) RAM model of computation. We explicitly denote by 
C [F] the fact that a construction C (implementing a function) makes queries to 
another function F, and we denote by A 0 the fact that an adversary A accesses 

an oracle 0. We denote by x S the process of sampling x from the set S 
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S o 

uniformly at random, and by y <— A° the process of running the randomized 
algorithm A with access to a randomized oracle 0, and sampling its output y. 
Also, we denote by *4° =4> y the event that the concrete value y is output in the 
same experiment. In general, we use a notation close to the one of Bellare and 
Rogaway’s Game Playing framework [7], which we hope to be self evident. 

Additionally, we denote by Pr [X = x\ the probability that the random 
variable takes the value x, and by E[X] its expected value. Also, the sta- 
tistical distance between two random variables X and X' is SD(X, X') = 
\ | Pr [X = x] — Pr [X' =ss x] | , where the sum is over all values which can 

be taken by X or X' . 


2.2 Ideal Models 

Our analyses are in the random function model , where algorithms and adversaries 

are relative to a randomly chosen function F <— Fes (m, £) for parameters m and £. 
A variant of the model grants access to multiple independent random functions 

F\, . . . , F t <— Fes (m, £), but these can equivalently be implemented in the single 
random function model for m! = ra+ [log t] , where the individual functions Fi are 
obtained as Fi(X) = F((i) || X), with (i) representing a [log t] -bit encoding of i. 
We often denote F = (Fi, . . . , F t ) to stress this dual representation explicitly. 
Therefore, all upcoming definitions are in the single random function model 
without loss of generality. 

We also recall that we can build a function F from m bits to i bits by 
making £ calls to a function from m + [log £”| bits to a single bit, i.e., F(X) = 
F'(( 0) || X) || • • • || F'((£ — 1) || X). The statement can be made precise via the 
notion of perfect indifferentiability [44], which we review in Appendix A. 

The definitions of this section also naturally extend to the random permu- 
tation model , where adversaries and algorithms can query one or more random 
permutations sampled uniformly from Perms(n). In particular, adversaries are 
also allowed query the inverses of these permutations. 

2.3 Block Ciphers and (related-Key) Pseudorandomness 

Let BC[F] : {0,1}^ x {0, l} n — > {0, l} n be an efficient construction making 
calls to a function F G Fes (We generally omit F whenever clear from 
the context). We say that BC = BC[F] is a («, n)- block cipher if BC (if,-) is a 
permutation for all n - bit K and all F G Fcs(m, £), and use the notation BCjy 
to refer to this permutation. Typically, we assume that BC k and BC^ 1 are 
very efficient to compute given iC, where efficiency in particular implies a small 
number of calls to F. 

(Multi-user) PRPs. We require block ciphers to be secure pseudorandom per- 
mutations (PRPs) [39]. In particular, we consider a multi-user version of PRP 
security, which captures joint indistinguishability of an (a-priori unbounded) 
number of block-cipher instantiations under different independent keys. The tra- 
ditional (single-user) PRP notion is recovered by considering adversaries making 
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queries for one single key. While the single- and multi-user versions are related by 
a hybrid argument, sticking with the latter will allow potentially tighter bounds 
in the second part of this paper, as the standard hybrid argument cannot be 
made very tight given only an overall bound on the number of queries. 

To this end, we consider two security games PRP-frg^ F for 6 G {0, 1}. In both, 

F <— Fes (m,^) is initially sampled, as well as independent keys iCi,iC 2 ,... <— 

{0,1}*, and permutations Pi,P 2 ,... Perms(n). 5 Then, the adversary A is 
executed, and is allowed to issue two types of queries: 

- Function queries x, returning F(x) 

- Construction queries (i,cr, z), where i G N, a G { — ,+}, z G {0, l} n . For 
6=1, the query returns BC K .(z) (if a = +, this is a forward query ) or BC k](z) 
(if a = — , and this is a backward query). For 6 = 0, the query returns P%(z) 
or P i _1 (z), respectively. 

Finally, A outputs a bit, which is also the game’s output. Then, PRP-security of 
BC is defined via the following advantage metric 

Adv PPP F („4) := Pr [PRP-1&.F =► l] - Pr [PRP-0 ^c,f =► l] • 

We also denote by Advg PP F (g, qp) the maximal advantage of an adversary A 
making at most q construction queries and qp function queries. Informally, we 
say that BC is (#, #f)- secure if Advg PP F (g, is “small”, i.e., negligible in ft. 

Related-key secure PRPs. We target the traditional notion of a related-key 
secure (or RKA-secure) PRP introduced by Bellare and Kohno [5]. In particular, 
for a key length ft, we consider a family & C Fcs(ft, ft) of key transformations. 
Given a (ft, n)-block cipher BC = BC[P] as above, we define the following two 
games RKA-PRP-1 and RKA-PRP-0. The game RKA-PRP-6jg C F<p proceeds as 

follows: It first samples P Fes (m,£), a key K {0, 1}*, and 2* independent 
permutations <— Perms(n) for all ft-bit k' . Then, A issues two types of queries: 

- Function queries x, returning F(x) 

- Construction queries (<r, </>, X), where cr G { — ,+}, (j> G z G {0, l} n . For 

6 = 1, the query returns BC^k){z) (if a = +, this is a forward query) or 
BC^jC*) (if cr = — , and this is a backward query). For 6 = 0, the query 
returns or P^ K) (z), respectively. 

Finally, A outputs a bit, which is also the game’s output. We define the RKA-PRP 
advantage as 

Adv p y F PR V) = Pr [RKA-PRP-1^ C|Fi# =*► l] - Pr [RKA-PRP-O&.f,* =► l] • 

The advantage measure Advg£^£ RP (g, Qf) is defined by taking the maximum. 

5 As we are sampling infinitely many objects, once can think of sampling these lazily 
the first time they are needed. 
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3 The Whitened Swap-or-Not Construction 

3.1 The Construction 

We present a construction of a block cipher using a function F : {0, l} n — ► 
{0, l} 2 , which we refer to as the Whitened Swap-or-Not construction , or WSN 
for short. This construction naturally extends the Shuffle- or- Not construction by 
Hoang, Morris, and Rogaway [33] to the keyless-function setting. 

For any even round number R = 2 r, the construction WSN = WSN^ 
expects round keys K \, . . . , Kr and whitening keys Wi , . . . , Wr, which are all 
n-bit strings. Its computation proceeds as follows, where j(i) — 1 if i < r, and 
j(i) = 2 else, and we interpret F as two functions F\ and such that Fj(x ) 
returns the j-th bit of F(x) for j G {1,2}. 

Construction WSN ( kI..., K r , Wi ,..., Wr ( x ): // V e {0, 1}” 

For i at 1 , . . . , R do 

V'_ 1 <- max{Xi_i,Xj_i © Ki} 

If F>i = 1 then X^ X z _i 0 X* else X^ X^_i 
Return X^ 

In the description, the max of two strings is with respect to the lexicographic 
order, and note that its purpose is to elect a unique representant for every pair 
{X, X ©ifj. As in [33], the construction extends naturally to domains which 
are arbitrary abelian groups. However, we will stick with the special case of bit 
strings in the following. 

It is easy to see that the construction can efficiently be inverted given the 
keys, simply by reversing the order of the rounds. 

3.2 Security of the WSN Construction 

Compared with the original Swap-or-Not construction, WSN adds at each round 
a whitening key W{ to the input of a (publicly evaluable) round function Fj ^ , as 
opposed to using a secret independent random function Fi (which in particular 
cannot be queried directly by the adversary). It is a well-known folklore fact 
that for a function F : {0, l} n — > {0, 1}, the construction mapping a key W and 
an input X to F{W 0 X) is indistinguishable from a random function under a 
random secret key W when F is random and publicly evaluable. 

However, the high security of WSN does not follow by simply composing 
this folklore fact with the original analysis [33]. This is because the folklore 
construction can easily be distinguished from a random function via 0{2 n ' 2 ) 

S 

queries to F(VF0-) (or a random function / <— Fes (n, 1)), and 0(2 n / 2 ) queries to 
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F. 6 To overcome this, a valid black-box instantiation would use a more complex 
construction mapping X to F{W\ ® X) ® • • • ® F(Wk ® X) (analyzed in [28]) for 
the round functions within Swap-or-Not. This would however result in roughly 
(9(n 2 ) calls to F, as opposed to 0(n) achieved by WSN. 

Security of WSN. The following theorem establishes the concrete security of 
the WSN construction with R = 2r rounds. 

Theorem 1 (Security of WSN). For all q, qp > 0 and for all r £ N, we have 

A <SN( ^,Aq,Qf) < 2V2V& n/4 Q + g 2 r • 

The proof of Theorem 1 is given in Sect. 3.3 below. Note that if r • q + qp = 
(1 — <a)2 n , then the above term can be made to be 2~ n for r = 0{n/a). For 
example, this allows to infer security for q = 2 n_logn_ °( 1 ) and qp = 2 n_2 . 

We also have no reason to believe that the construction would be insecure if 
we used a function with a single output bit throughout the evaluation, but we 
could not find a suitable proof and leave this analysis as an open problem. 

Single-permutation instantiation. The WSN construction can be instan- 
tiated from a single permutation if we are ready to enlarge the domain of the 
permutation to n + 2 bits. This follows from a result of independent interest, 
proved in Appendix B. Namely, we prove that a 2 c -eall construction of a function 
F n £ Fcs(n, c) from any permutation i r £ Perms(n + c) due to Hall, Wagner, 
Kelsey, and Schneier [32] is perfectly indifferentiable [44] from a random function. 
This in particular implies (by the composition theorem in Appendix A) that we 
can replace the function F by our construction and still achieve the same security 
bound in the random permutation model. 

Full-domain security. Two recently published works [47,49] enhance swap- 
or-not to full-domain security (i.e., security against q — 2 n queries) at the cost 
of making 0(n 2 ) calls to the construction in the worst-case. (The later work [47] 
shows how to reduce the complexity to 0(n) in the average case). One could hope 
to use their results generically to obtain (2 n , 2 n ( 1-e ))-security in our setting. 

Unfortunately, these results require security for q = 2 n_1 , which is unattain- 
able by the above bound. By inspecting the proof of Theorem 1, it is however 
not hard to verify that a version of the WSN construction with independent 
round functions F\, . . . , Fr can be made to achieve (2 n_1 , 2 n ( 1-e ))-security (in 
essence, this is because one can easily reduce the exponential term in the bound 
to + q< 2 - 2 ^ ) r ^ 4 ) and the results from [47,49] can be used in a black-box way. 

Nevertheless, we point out that in contrast to the small-domain setting of [47, 
49], here we are mostly targeting a large n (e.g., n = 128), for which 2 n ( 1_e ) 
security can be largely sufficient. The additional cost may thus not be necessary. 

6 Roughly, pick Xi , . . . ,Xq,X[, . . . , Xq to be independent uniform n-bit strings of 
length n—k, for some k = [logn] and Q ~ 2 n ^ 2 . Then one just queries Y Z: i F(W © 
(Xi || z)) and Y'j F(Xj || z ) for all £ [Q] and z £ {0, l} k . The distinguisher 
finally outputs one if and only if there exist i and j such that Yi, z — Y- z for all 
2€{ 0,l} fc . 
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3.3 Proof of Theorem 1 

Our proof shares similarities with the original analysis of Swap-or-Not [33], but 
dealing with the setting where the function F is public requires a careful exten- 
sion and different techniques. To this end, we follow an approach used in previ- 
ous works by Lampe, Patarin, and Seurin [36], and by Lampe and Seurin [38] to 
reduce security analyses for PRP constructions in ideal models to a non-adaptive 
analysis. (With some extra care due to the fact that we deal with the multi-user 
PRP security notion). In particular, we are first going to prove that the WSN 
construction, restricted to half of its rounds, satisfies a weaker non-adaptive 
security requirement, which we introduce in the following paragraph. 

Non-adaptive security. Let BC = BC [F] be a (/c,n)- block cipher construc- 
tion based on some function F : {0, l} m — > {0, 1} £ . Now, let us fix a set of tuples 
Tp = {(xi, Vi)}ie[q F ] with Xi G {0, l} m and yi G {0, 1} £ for all i G \qf], and such 
that every Xi appears only in one pair in Tp. Moreover, let us fix a sequence X 
of q distinct inputs such that X[j] = ( ij,Xj ) for all j G [g], where ij G N and 

A e{ o,i}". 

Then we consider two processes - sampling two sequences Y and Y' of q 
n- bit strings - defined as follows: 

- Y (the real world distribution) is obtained by sampling random F-bit strings 

Pi, K 2 , . . . <— {0, 1} K , sampling a random F <— Fes (m,£) conditioned on sat- 
isfying F{xi) = yi for all i G [^f], and finally letting Y[j] <— BC [P]#^. (Xj) 
for all j G [q \ . 

- Y' (the ideal world distribution) is obtained by sampling random permutations 
Pi, P 2 , . . . Perms(n), and letting Y [j] P ij [Xj ) for all i G [q\. 

Then, we define the advantage metric 

Adv^ PRP (X,T F ) := SD(Y , Y') , 

where SD denotes statistical distance. Moreover, let Advg£^ PRP (g, qp) be the 
maximum of Advgq^ PRP (X, Tp) taken over all g-sequences X and all sets Tp of 
size qp. 

From non-adaptive to adaptive security. We make use of the following 
lemma. The proof is very similar to previous works [36,38] and makes crucial use 
of Patarin’s H-coefficient method [48]. The main difference is that our version 
deals with the multi-user PRP security notion. (A self-contained version of the 
proof is found in the full version). 

Given a (f, n)-block cipher BC[F] relying on a function P : {0, l} m — ► {0, 1}^, 
then let BC[Fi] o BC -1 [F 2 ] be the (2ft,n)-block cipher which relies on two func- 
tions Pi, F 2 : {0, l} m — ► {0, 1 Y 1 an( i which on input X G {0, l} n and given key 
K\ || K 2 G {0, l} 2/ % returns BC[F 2 ]] c 1 2 (BC[Fi\k 1 (X)). The following lemma tells 
us that if BC is non- adaptively secure (as in the above notion), then BC o BC -1 
is adaptively secure in the sense of being a secure PRP for attackers making both 
forward and backward queries. 
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Lemma 1 (Non-adaptive => Adaptive Security). For all q,q F , we have 


^ c * v bc[Fi]oBC- 1 [ j P 2 ],(Fi,f 2 )(^^f) < 4 • ^/Advg^] P ^ p (^, q F ) . 

Note that a stronger version of this statement (essentially without the square 
root) can be proved [18,43] in the setting where q F = 0. 

Non-adaptive analysis of WSN. We first adopt a slightly different represen- 
tation of the WSN construction. In particular, let WSN^ = WSN^[F] be the 
construction relying on a function F : {0, l} n — ► {0, 1} which operates as the 
original WSN construction for r rounds, but always uses the the function F 
(instead of using one function Fi for the first half, and the function F 2 for the 
second half of the evaluation). Then, it is easy to see that 

WSN (2r) [Fi,F 2 ] = WSN (r) [Fx] o (WSN {r) [F 2 ]j \ (1) 


where in particular we have used the fact that the inverse of WSN is just the 
WSN itself, with round and whitening keys scheduled in the opposite order. 

The key element of our proof is the following lemma, which, combined with 
Lemma 1 and Eq. (1) immediately yields Theorem 1. 

Lemma 2 (Non-adaptive Security of WSN). For all q and q F , and N = 2 n , 


Adv 


NCPAPRP 

WSN (r) [F],F 


(q,qf) < \q^N Q 


i \ r / 2 

q • r + g F \ 

2 N ) 


Proof ( Of Lemma 2 ). We fix a sequence of q distinct queries X, as well as a set 
T f of q F input-output pairs. For now, we only consider the single- key setting, i.e. , 
all queries X[j] are of the same index ij = 1, and thus we omit these indices ij. 
(We argue below how the multi-user case follows easily from our proof). Denote 
the randomly chosen round keys as K = (K[l], . . . , K[r]) and the corresponding 
whitening keys as W = (W[l], . . . , W[r]). 

We are going to consider the evolution of the evaluation of WSN on these 
inputs simultaneously , and denote the joint state after t G {0} U [r] rounds as 
X t = (X t [1], ..., X t [</]), with X 0 = X. With U uniformly distributed on the set 
of q distinct n-bit strings, we are going to upper bound 


Adv 


NCPAPRP 

WSN (r) [ F ], F 


(X,7» = SD(X r ,U) . 


For any i G [q\, denote by Qt[i] the set of input-output pairs corresponding to 
the t F queries made to compute X t [z] from Xo[i]. Let now U t ,i be a uniformly 
distributed value on the set S t ,i := {0, l} n \{X t [l], . . . , X t [i — 1]}, and let to 
be a uniform (q— i)-tuple of distinct strings from St,i+ 1 - Then, for all t G {0}U[r], 
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SD(X,.U) < ^SD((Xy- 1 ,U t)i _ 1 ),((Xy,U M )) 

i= 1 

< SD((Q,- : ' •'.xy-'tr,,, U M ), (Qp^txp, U tii )) 

i=l 

= ^SD((Qf- 1 ,Xf),(Qf- 1 ,Xf- 1 ,C7 M )) = ^E[SD(X t [i],£/ M )] • (2) 

i=l i = l 

since SD(/(X), f(Y)) < SD(X, F) for all /, X, F, and the i-th expectation in 
the sum is over Qy 2_1 , X^ 2_1 , W- t , and K- t . 

For all a G St,i, we now we define the random variable pt,i(a) as the prob- 
ability that X t [i] = a conditioned on the actual values taken by the random 
variables Qy z_1 , Xy z_1 , W- t , K- t . (In particular, Pt,i(a) is a random variable 
itself, as it is a function of these random variables). Also, let Ni := N — i + 1. 
Then, by Cauchy- Schwarz and Jensen’s inequalities, we obtain 



We are going to give a recursive formula for E[Z\^], where 



Note that Z\o,i = E[/Aoy] = 1 — It is now convenient to assume that Qy z \ 
X t - i_1 , K-*, are fixed to some values (and thus so are and p t) i(a)), 

and we are going to study E[Z\ t +i^], where the expectation is now over X^!^ 1 , 
K[t + 1], W[£ + 1] and Qj+i 1 . In particular, define Qb (for b G {0, 1}) to be the 
set of all inputs of queries to F for which we know the corresponding output, 
i.e. , x G Qb if (x,b) G T F or (x,b) G Q t \j] for some j G [i — 1]. Moreover let 
Q := Qo U Qi and Q := |Q|, and note that Q < t • (i — 1) + q F . 

With the above being fixed, we are now considering the random experiment 
where we sample K[£ + 1] and W[£ + 1], and we are going to compute the 
expectation of in this experiment. More concretely, we define a function 

ip : St,i —> St+i,i (which is also a random variable, as it depends on 
K[t + 1] and W[t + 1]) as follows: 
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r a if (1) max {a ® K [t + 1], a] ® W[t + 1] G Qo, or 

(2) a ® K [t + 1] ^ 

and max{a ® K [t + 1], a} ® W [t + 1] ^ Q, or 

(3) a © K[t + 1] G 5t 5 jandmax{a ® K[£ + 1], a} © W [t + 1] ^ Q, 
a © K [t + 1] if (4) max{a © K[t + 1], a} © W [t + 1] G Qi, or 

(5) a ^ 5t+i 5 iand maxfa © K [t + 1], a} © W [t + 1] ^ Q. 


Note that p is a bijection. Indeed, if X t [i] = a implies X t+ i[i] = a' (where 
a' £ {a, a 0 K [t + 1]}), then p(a) = a' (this corresponds to exactly one of the 
first four cases), and otherwise we let p{a) = a. Also note that p does not depend 
(directly) on Q^^ 1 , only on S t + i,z, K [t + 1], W [t + 1], and Qy 2_1 . Using both 
the bijectivity of p as well as the linearity of expectation, 


e[A + m ]= £ 





Recall that the expectation here is over the choice of Q^ ] ! 1 1 , K[i + 1] and 
W[t + 1]. We prove the following lemma in Appendix C. 

Lemma 3. For all a G S t ,i, 


M»ii - j?:) 




a 



We can thus replace E 


(pt+ i,iO(a)) - G) 


in the above, and using the fact 


that A,t = Y,aeS t ,i(Pt,i ( a ) “ ]© 2 , this simplifies to 


E [A+i,i] = E E 

aES t ,i 


p t+hi (ip(a)) - — 


< 1 - 


Nj-(N-Q t ) \ 
2-N 2 ) 




where Q t = t(i — l)+qF- Now, we come back to thinking of Xy 2 i , K- t and W- t 
as being randomly chosen (rather than fixed), and evaluate E [A t ^\ recursively. 

The above in particular implies that E[Z\^] < ^ 
thus 


l - 


^2 + 


Nj-(N-Q) 

2-N 2 

r -q + q F 


) E[Z\ t -i,i], and 


2iV 


Now, we can put this together with (2) and (3), and see that 


Note that for the multi-user case, the proof is essentially the same, with slightly 
more complex notation. The only difference is that we define St,i and all related 
quantities only with respect to the previous queries for the same key / user. The 
upper bounds are the same however, as they only depend on iV, g and q F . This 
concludes the proof of Lemma 2. □ 
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4 Related-Key Security 

4.1 Related-Key Secure Key Derivation 

We consider the general notion of a related-key secure key-derivation function, 
or RKA-KDF for short. Informally, for a class of key-transformation functions 
& C Fcs(/€, k), this is a function KDF : {0, 1} K — > {0, 1} £ such that KDF (0(K)) 
gives independent, pseudorandom values for every (f) e <P. A similar notion was 
considered by Lucks [40] and by Barbosa and Farshim [3]. 

Formal definition. Let KDF[F] : {0,1}^ — ► {0,1}^ be a construction that 
calls a function F : {0, l} m — > {0, l} n . In Fig. 1, we define the security games 
RKA-KDF-0 and RKA-KDF-1 involving an adversary A and a class of key trans- 
formations C Fcs(ft, k). In the real world (Game RKA-KDF-0), the adversary 
A makes queries to a random function F via the F oracle and can obtain eval- 
uations of KDF [F]((j>(K)) for multiple 0 G @ of its choice via the Eval oracle, 
and these values should be indistinguishable from random values, which are 
returned by the Eval oracle in the ideal world (i.e., in Game RKA-KDF-1). The 
RKA-KDF-ad vantage is then defined as 


AdC^ D /(4) = Pr [RKA-KDF-0j^ DF F> <p ^ l] - Pr [RKA-KDF-1^ df>f>< , => l] , 

and Adv F Q F ~ F ^, F (q, q F ) is obtained by maximizing the above over all adversaries 
making q queries to Eval and making qp queries to F via the F oracle. 


Procedure MAIN: 

Procedure Eval(0): 

// Game RKA-KDF-6, b 6 {0, 1} 

// Game RKA-KDF-6, b G {0, 1} 

F 4- Fes (m,n), G 4- Fcs(*,£) 

If b = 0 then 

AT 4- {0,1} K 

b' 4- _4 F ’ E -' 

Return KDF [F](0(F)) 

Else return G((j)) 

Return b' 

Procedure F(x): 

Return F(x) 


Fig. 1. RKA-KDF security. The procedure Eval, in both games, takes as input a 
function 0 £ <&. Also, the notation G((j)) denotes G applied to some unique bit-encoding 
of the function 0. 

Remark 1. An alternative definition has the Eval oracle return G(<p(K)) for a 

random function G <— Fcs(k,£). Our choice is better suited to the composition 
theorem below, and shifts the burden of dealing with the combinatorics of <P to 
the RKA-KDF security proof. 

The composition theorem. We can compose an arbitrary (£, n)-block cipher 
construction BC[F] and a key- derivation function KDF : {0, 1} K — > {0, 1} £ using 
the same function F, into a new (/«, n) block cipher BC = BC[F, KDF] such that 

BC K (X) = BC KDHK) (X). (4) 
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for every K G {0, l}^ and X G {0, l} n . The following theorem shows that if 
BC is a secure PRP and KDF is RKA-KDF secure, then the composition BC is 
a related-key secure PRP. Note that the fact that we consider multi-user PRP 
security is central in allowing us a tight reduction. 

Theorem 2 (The Composition Theorem). Let BC = BC[F, KDF] be the 

-block cipher defined above, and assume that BC makes at most t calls to F 
upon each invocation. Let <L> C Fcs(ft, k) be a class of key transformations. Then, 
for all q,q F , 

Adv ^ A ; P / P (^ Qf) < 2 • Adv^^ F (g, q F + q • t) + Adv FPP F (g, q F ) . 

Proof (Sketch). One uses RKA-KDF security to transition from RKA-PRP-1 to a 
setting where each query (</>, x) to the block cipher is replied with an independent 
key Kfi as he., we map every </> with an independent ft-bit key K 

This is exactly PR P-1 (except that users are now identified by elements of F) 
and results in the additive term Adv P Qp~ KDF (g, q F + q • t) in the bound by a 
standard reduction. Similarly, one uses RKA-KDF security to transition from 
RKA-PRP-0 to a setting where each query ( 0 , x) to the block cipher is replied 
with an independent permutation P^, and this exactly maps to PR P-0, and 
results in another additive term Adv P Qp" KDF (g, q F -\-q-t). The final bound follows 
by the triangle inequality. □ 

Note that in a similar way, if KDF and BC use different functions F and F' , then 
we can reduce Adv KDF ((/, q F + q • t) to Adv KD p^p^ ((/, q F ). 


4.2 Efficient RKA-KDF-secure Construction 


This section presents an RKA-KDF-secure construction from a (small number of) 
random functions with n-bit domain approaching ( 2 n ( 1_e ), 2 n ( 1 -e ))-security. (As 
we argue below, this can be turned into a construction from a single function 
F : {0, l} n — > {0, 1} with standard tricks). Our construction will guarantee F- 
RKA-KDF-security for every class <L> C Fes (k, k) with the following two properties 
for (small) parameters 7 , A G [0, 1]: 


7-collision resistance. Pr K <— {0, 1}* : <p(K) = <p'(K) < 7 for any two dis- 
tinct <p , (j)' G F. 

A- uniformity. For any <f G F, we have that SD(iC, 4>{K)) < A for K {0, 1}*, 
i.e., <f(K) is A-close to uniform for a random key K. 


For example, F® = {K 1 — > iC 0 Z\ : A G {0, 1}^} is both 0-collision-resistant and 
0 - uniform. 


Combinatorial hitters. Our construction makes use of the standard com- 
binatorial notion of a hitter [30], which we introduce with a slightly different 
parameterization than what used in the literature. Consider a family of func- 
tions E = (Ei, . . . , E t ) such that : {0, 1} K — > {0, l} n . 
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Definition 1 (Hitters). The functions E = (Ei,...,E*) with E i : {0,1}^ — > 
{0, l} n are an (a, /3) -hitter if for all subsets Q C {0, l} n with \Q\ < /3 • 2 n , 
Pr [K <- {0, 1}« : Vi G [t] : E*(K) G Q] < a. 

In our setting, we are going to have /3 = 2~ ne (for some (small) e > 0, and 
in particular 1 — f3 > |) and a = 2~ n . There are polynomially-computable 
explicit constructions of hitters (cf. e.g. [30] for an overview) with sufficiently 
good parameters for our purposes, where 

k = 2n + 0( log(l/a)) = 0(n) , t = 0( log(l/a)) = 0(n) . (5) 

The full version gives further details about a concrete example of a “reasonably” 
cheap construction relying on random walks on constant-degree expander graphs. 
We will require our hitters to be injective , i.e., for any two inputs X and X', 
there must exist i such that E$(X) ^ E i{X f ). It is easy to enforce injectivity for 
any hitter by just adding 0(k/u) functions to the family. 

The MT Construction. We now present our construction of an RKA-KDF- 
secure function, which follows the framework of Maurer and Tessaro [45] . Let E = 
(Ei, . . . , Et) be such that E^ : {0, 1} W — ► {0, l} n . Moreover, let Fij : {0, l} n — ► 
{0, l} 2/s: + n for i G [t\ and j G [r], Gj : {0, l} n — > {0, 1 Y f° r 3 C [r]. For simplicity, 
denote F = (F id ) ie[t] J€ [ r ] and G = {Gi) ie[t] . 

The MT[E, F, G] construction operates as follow. (Here, O denotes multiplica- 
tion of (2ft+n)-bit-strings interpreted as elements of the corresponding extension 
field F 2 2 «+n). 

Construction W\T[F,G]{K): // X G { 0,1}* 

(1) For all j G [r], compute 

(2) Compute K' 0^ =1 Gi(S[i]) . 

(3) Return K ' . 


RKA-KDF security. The above construction is indifferentiable from a random 
oracle [22,45] whenever E is a so-called input-restricting function family. While 
this combinatorial property would also imply RKA-KDF security, explicit con- 
structions of such function families require a very large t = 0{n c ) for a large 
constant c, as discussed in [22]. 

Here, in contrast, we show that for RKA-KDF security it is sufficient if E 
is a good hitter. The following theorem summarizes the concrete parameters of 
our result. The complete proof is deferred to the full version for lack of space. 
We give some intuition further below. 
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Theorem 3 (RKA-KDF-Security of MT). Let E be an (a, (3 = qF / 2 n ) -injective 
hitter. Moreover , let <L> C Fcs(ft, k) be a (7, A )-well behaved set of key transfor- 
mations. Then , for all adversaries A making q queries to Eva I, qp queries to the 
F -functions, and qc queries to the G -functions, 


Adv 


RKA-KDF 

MT ,(F,G),<£ 


M)< 


Art 

2 n 


+ q(a + A) + q 2 7 + q • 


( qc + q 

V 2™ 


r 


Instantiations. Let us target security for qp = q = 2 n ( 1_e ) (e.g., e = 0(l/n)), 
£ = n, and additive attacks <L> = <L>® with 7 = A = 0. First note that because we 
want a ~ 2 _n and /3 = 2 _en , then we can use E with n = O(n) and £ = O(n) by 
(5). Moreover, we need to ensure that 2 r ( 1-n ) • 2 n ( 1 -e h r+1 ) < 1 or alternatively 
r(ne — 1 ) > n( 1 — e), which is true for r = r(e) = and r = 0(n ) for 

e = 0(l/n). 

Therefore, the construction evaluates a linear number of functions with linear 
output 0{n ), or alternatively, 0(n 2 ) single-bit functions {0, l} n — > {0,1}. This 
can be turned into evaluating 0(n 2 ) one single function { 0 , i} n + 21 o s n +°( 1 ) _ 
{0, l}. 7 Improving upon this appears to be a significant barrier. 

The MT construction can be combined with the WSN construction above to 
obtain an RKA-secure block cipher with ( 2 n ( 1_e ), 2 n ( 1 -< 0)-security via Theorem 2 
for any class <L> with small A, 7 . 

Overview of the proof of Theorem 3. We explain here the basic ideas 
behind the proof of Theorem 3. 

To start with, it is convenient to first consider a toy construction, using 
only t functions F m (Fi) ie ^ with F{ G Fcs(n,£), in conjunction with a hitter 
E = (Ei,...,E t ) as above. On input K G {0,1} K , it outputs 0^ F}(E^(iT)). 
Also, let us only consider RKA-KDF attackers which make all qp of their F 
queries beforehand , and only then query Eva I on inputs </>i, . . . , where the fa 
functions are such that (f>i(K) is uniform for a uniform K. 

Assume without loss of generality the uniform key K is sampled after the 
F-queries have been made. Since E is an (a, j3 = gi?/ 2 n )-hitter, then by the 
union bound, for every k G [q\ there exists some i*(k) such that E i. W {MK)) 
was not queried to in the first phase, except with probability q • a. There- 

fore, for all k G [q\, the value 0 ^ =1 F^(E i (0/ c (iT))) is individually uniform, even 
given the transcript of the F queries, but unfortunately, this does not guar- 
antee independence of these outputs. Indeed, for two k and k ' , we may well 
have i*(k) = i*(k '), and we cannot exclude that for all i ^ i*(k) both values 
Fi(Ei((f)k(K))) and F^E^^/ (K))) are known as part of the F-queries made in 
the first phase. Then, the output values for k and k! are clearly correlated. 

Instead, by using two rounds with functions (Fi,j)ie[t],je[r] an d (Gj)je[r] 
(where F^j G Fcs(n, n) and Gj G Fcs(n,£)), we would generate values Sk[j] <— 

7 Note that we can play a bit with parameters, and given a function F : { 0 , l} n 
{0,1}, interpret it as a function {0, l} n + 21og ( n ) — {0,1} for a suitable n' only 
marginally smaller than n, and obtain an instantiation of our construction with 
respect to n still making roughly 0 (n 2 ) calls to F. 
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©i=i hoping that, in addition to being individually uniform as 

above, Sk[j] and Sk'[j] are unlikely to collide for any k k ' . 

If the final output of the construction is @^ =1 Gj(Sk\j]), the above would 
imply security: Indeed, with very high probability, we can show that for every 
fc, there is going to always exist some j* such that Sk[j*] was never queried to 
Gj* previously directly by the attacker (because of the individual uniformity of 
the value) and that no other k' 7^ k is such that Sk'\j*] = Sk\j*]- (Exploiting 
independence of the S/JjJ’s, the probability that such j * does not exist can be 
made very small, of the order ). 

There is a final catch. Imagine we are in the above 
i.e., for two k and k' and j G [r], we have i*(k) = F 
i*(fc), and Fij(Ei((f)k'(K))) are known. Then, the fact that Sk\j] 

and Sw [j] collided is already determined by the transcript of the F queries, 
independent of Our approach to address this problem 

is to make the output of the F - values larger (roughly 2 k + n bits) and to use 
multiplication. This will make sure that given that any two partial product 
defined by the F queries as above will not collide (over 2 k + n bits) , and thus 
(by the fact that multiplication with truncation gives a universal hash function), 
the final products, truncated at n bits, will also be unlikely to collide. 


“unfortunate” setting, 
* (&'), and for all i 7^ 


A Indifferentiability 


We briefly review the notion of indifferentiability by Maurer et al. [44] as needed 
in this paper. 

Let C [G] : {0, l} m — > {0, 1} £ be a construction from a function G : {0, l} a — > 
{0, l} 6 . We say that C is indifferentiable from a random function if C [G], for G 

Fcs(a, 6), is “as good as” a randomly chosen function F Fes (m,£) in a setting 
where an adversary is given access to both C [G] and the underlying function 
G. This is formalized by requiring the existence of a simulator 5, accessing F, 
which mimics the behavior of G in a way that makes real and ideal worlds 
indistinguishable. 

Formal Definition. For an adversary A and a simulator S , the indifferentia- 
bility advantage is 


Adv^spl) = Pr \G £ Fes (a,b) : ^ C I G ]’ G 


-Pr 


Fcs(m, £) : A 




Similarly, for a construction C[7r] from a permutation 7 r G Perms(a), we define 


Adv^GsM) = Pr [tt 4- Perms(a) : A c[7t] ’ 


— Pr 


Fcs(m, £) : A F ’ S 
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Note that in the latter case, the simulator S simulates both the behavior of 
7 r and 7r _1 queries. We are going to call queries to the first oracle (i.e., either 
C [G], C[ir] or F ) construction queries , and queries to the second oracle (either 
G, 7T,7r _1 , or S F ) primitive queries. 

In this paper, we are going to only consider an information-theoretic version 
of indifferent iability. 

Definition 2 (Indifferentiability). A construction C[F] (where F is either a 
permutation or a function) is (e, s)-indifferentiable from a random function if 
there exists a simulator S such that for all adversary A making q construction 
queries, and q% primitive queries, Advq^f 5 ^ 5 <s(-4) < e(q,qz), and where addi- 
tionally, upon each invocation via a primitive queries, the simulator F makes at 
most s queries. Moreover, the simulator answers each query in time polynomial 
in q E . 

We say that C [F] is perfectly indifferentiable if it is (0, l)-indifferentiable. 

Composition Theorem. We use the following fact below, which follows from 
general composition theorems [21,44] adapted to the specific case of block ciphers 
considered in this paper. 

Theorem 4 (Composition Theorem for Block Ciphers). Let BC = BC[F] 

be a (ft, n) -block cipher making at most t calls to a function F : {0, l} m — > {0, 1} £ , 
and let C[A] be a construction using a primitive F which is (e, s) -indifferentiable 
from a random function. Consider the ( K,n)-block cipher BC' = BC '[F] = 
BC[C[A]] ; i.e., calls to F are replaced by calls to C [F]. Then, 

qz) < Advg^ F ] ?F (g, s • qjj) + 2 • eft • q, qz) . 

B From Permutations to Functions 

In this section, we revisit the security of a construction by Hall, Wagner, Kelsey, 
and Schneier [32] to build a random function F : {0, l} n — ► {0, 1} C from a 
permutation i r : {0, l} n + c — {0, l} n+c . In particular, here we show that their 
construction achieves the stronger notion of perfect indifferentiability defined 
above in Appendix A, and thus can be used to replace (in a black-box way) the 
function F in the WSN construction. Note that in [32], only indistinguishability 
was shown. We believe that this result is of interest beyond the scope of this 
paper. 

The construction. Let it : {0, l} n+c — ► {0,l} n+c be a permutation. The 
2 c -query construction Fcr[7r] : {0, l} n — > {0, 1} C proceeds as follows, on input 
X G {0, l} n : It outputs the c-bit value Z* such that tt(X || Z*) is the smallest 
element in{7r(X||Z) : Zg{ 0,1} c }, where smallest is according to lexicographic 
order. (Or any other total order on strings). 
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Security. The following theorem establishes security of F in terms of indiffer- 
entiability. 8 

Theorem 5 (IndifFerentiability of F). The construction F c = F c [7t] is per- 
fectly indifferentiable from a random function. 

Proof We need to prove that there exists a simulator S such that Adv'p^ f ^(^l) = 
0 for all adversaries A, and moreover, S simulates a permutation from Perms(n + 
c), together with its inverse, and makes at most one single query to a given 

function F Fcs(n, c) upon each invocation. 

To help with the definition of the simulator, for a function / G Fcs(n, c) 
and a permutation r G Perms(n + c), we define a new permutation 7 r[r, /] G 
Perms(n + c). To this end, for every x G {0, l} n , we define 

y* = min {t(x || z) : z € {0, 1} C } 

and y x = r(x || f(x)). Note that ?/* is the output of r on input x || F c [t\(x) and 
thus if / = F c [r], y x = y x - The permutation 7 t[t, /] is such that 

(y* if t(x || z) = y x ,i.e., f(x) = z 

tt[t, /] (x || z) = < y x if t(x jj z) = y* 

[ r(x || z) else. 

In other words, 7 r[r, /] re-arranges r to assign 7 t[t, f](x || f(x)) the smallest value 
among r(x || z') for z' G {0, 1} C . Clearly, given r, 7 r[r, f](x || z) can be computed 
with a single query to / and 2 C queries to r. Moreover, note that the inverse 
7 T _1 [ T,f } is 


f r 1 (y x ) if y = y* 

7 r_ 1 U/]( 2 /) = 1 T~ 1 (y*) if y = y x 
[r _ 1 (j/) else. 

Note that the check y = y* and y = y x can be implemented by first computing 
t - 1 (t/), which returns x\\z, and then querying r(x\\z f ) for all z' 7 ^ 2 , as well as 
f(x). In particular, 7 t -1 [t, /] can also be evaluated with one query to /, given r. 
The simulator S now simply does the following when given oracle access to 

/: It maintains a random permutation r <— Perms(n + c) (implemented via lazy 
sampling), and on a forward query x\\z, replies as 7 r[r, f\{x || z), and on inverse 
query y it replies as 7r _1 [r, f](y). By the above, this requires one / query per 
evaluation. 

Therefore, to prove perfect indifferentiability, it is enough to prove that 
( F c [ tt] , 7r) (for 7 r Perms(n + c)) and (/, 7 t[t, /]) (for / -3- Fcs(n, c) and 

r Perms(n + c)) are identically distributed. This can be done in two steps: 


8 A previous version of this paper provided a somewhat more cumbersome yet equiva- 
lent description of the simulator. The far more elegant description using 7 r[r, /] was 
suggested by an anonymous reviewer we wish to thank. 
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1. First, note that F c [i r[r, /]] = /. This is because on input x, F c outputs 2 such 
that 7 t[t, /](x || 2 :) is smallest. This must be 2 : = f(x), because 7 r[r, /] is such 
that 7t[t, /](x || /(x)) = 7 /*, which is the smallest value among r(x || 2 /), and 
thus also among 7 r[r, /](x || 2 /). 

2. Therefore, it suffices to show that the permutation 7r[r, /] is uniformly distrib- 
uted. This is because 7 r[r, /] is obtained by sampling a random permutation 
r, and then for all x, swapping 7 /* with the output of x || 2 : for a randomly 
chosen z = f(x). This gives a uniform random permutation. 

This concludes the proof. □ 


C Proof of Lemma 3 


For every a G St,i, we now define now two subsets partitioning {0, l} n x {0,l} n , 
i.e., the key space for round t + 1: 

W/C+ := {(re, k ) : a © k G 5t,z A max{a © &, a} © re Q} 

W7C~ := {(re, &) : a © k ^ V max{a © k, a} © re G Q} 


It is easy to see that 

|VWC+| = Ni ■ (N — Q) , \WJC~ | =N 2 -Ni-(N-Q) 

because for every a we have exactly \St,i | = values of k such that a©& G , 
and moreover, we have (for each such value k) exactly N — Q possible values of 
re with max{a, a © k} © re ^ Q. Also, note that for (re, k) G W/C~, 


E 


(Pt+l,i(‘fi( a )) ~ 1 /Nif 


K[t+ 1] = fc,W[i + 1] = w 


ptA a ) 2 , 


whereas for (re, /c) G W/C^, 


Pt+i,i(<p(a)) - -^r ) K[t + 1] = fc, W[t + 1] = W 


2 A^ 


Putting all of this together, we obtain 

2 


(^Pt+l,i( ( P( a )) Ni ) 

= (pt+i.iOX®)) - j^r) K[i + 1] = fc,W[i+l] = to 


k,w 


1 

7V2 


L (p*»-G) + T (^ 


)+Pt,i(a0fc) 




.( w,k)ewic ~ 

Ni(N-Q) 

N 2 


(w,k)ewiC 2 


) (Pt,i(«) - A) 


_L_ | 1 N—Q 

AT. I ' N 2 


E f pt,i( a )+Pt,i(y ) 

V 2 Ni ) » 
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where we have used the structure of W/C+ , and the fact that for every y E S t ,i 
there exists k such that a 0 k = y , and corresponding N — Q values of re. In 
particular, we can expand 

E -i) ! = lE ((*.« - A) + (ruto - *))’ 

yes t ,i yes t ,i 

= f - • ( PtAa ) - E) + Um » 

where we have used in passing the fact that ^2 yeSt . (pt,i( a ) ~ jf) = 0- When we 
plug this back into the above, we then get 


E 


(pt+ 



Nj(N-Q) 

4-N 2 




+ 


1 N-Q A 
4 


This concludes the proof of Lemma 3. 
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Abstract. We provide a security analysis for full- state keyed Sponge 
and full-state Duplex constructions. Our results can be used for making 
a large class of Sponge-based authenticated encryption schemes more effi- 
cient by concurrent absorption of associated data and message blocks. 
In particular, we introduce and analyze a new variant of Sponge Wrap 
with almost free authentication of associated data. The idea of using 
full-state message absorption for higher efficiency was first made explicit 
in the Donkey Sponge MAC construction, but without any formal secu- 
rity proof. Recently, Gazi, Pietrzak and Tessaro (CRYPTO 2015) have 
provided a proof for the fixed- output-length variant of Donkey Sponge. 
Yasuda and Sasaki (CT-RSA 2015) have considered partially full-state 
Sponge-based authenticated encryption schemes for efficient incorpora- 
tion of associated data. In this work, we unify, simplify, and general- 
ize these results about the security and applicability of full-state keyed 
Sponge and Duplex constructions; in particular, for designing more effi- 
cient authenticated encryption schemes. Compared to the proof of Gazi 
et al., our analysis directly targets the original Donkey Sponge construc- 
tion as an arbitrary- output-length function. Our treatment is also more 
general than that of Yasuda and Sasaki, while yielding a more efficient 
authenticated encryption mode for the case that associated data might 
be longer than messages. 


Keywords: Sponge construction • Duplex construction • Full-state 
absorption • Authenticated encryption • Associated data 


1 Introduction 

Since its introduction, the Sponge construction by Bertoni, Daemen, Peeters and 
Van Assche [4] has faced an immense increase in popularity. As “simple” hash 
function mode, it is the fundament of the SHA-3 standard Keccak [5], but also its 
keyed variants have become very popular modes of operation for a permutation 
to build a wide spectrum of symmetric-key primitives: reseedable pseudoran- 
dom number generators [7], pseudorandom functions and message authentica- 
tion codes (PRFs/MACs) [9,11], Extendable- Output Functions (“XOFs”) [24] 
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and authenticated encryption (AE) modes [10,11]. The keyed Sponge principle 
also got adopted in Spritz, a new RC4-like stream cipher [26], and in 10 out 
of 57 submissions to the currently running CAESAR competition on authen- 
ticated encryption [1,3]. These use cases reinforce the fact that Sponge-based 
constructions will continue to play an important role, not only in the new hash- 
ing standard SHA-3, but in various next-generation cryptographic algorithms. 

The classical Sponge construction consists of a sequential application of a 
permutation p on a state of b bits. This state is partitioned into an r-bit rate or 
outer part and a c-bit capacity or inner part, where b = r + c. In the absorption 
phase, message blocks of size r bits are absorbed by the outer part and the 
state is transformed using p, while in the squeezing phase, digests are extracted 
from the outer part r bits at a time. In the indifferentiability framework of 
Maurer, Renner and Holenstein [20], Bertoni et al. [6] proved that the Sponge 
construction is secure up to the 0( 2 C / 2 ) birthday- type bound. The capacity 
part is left untouched throughout the evaluation of the Sponge construction: a 
violation of this paradigm would make the indifferentiability security result void. 

In this work, we strive for optimality, and investigate the most efficient ways 
of using Sponges for message authentication and authenticated encryption in a 
provably secure manner. In both directions, we consider a generalization of the 
currently known schemes to full- state absorption , the most efficient usage of the 
underlying permutation, and we show that these schemes are secure. Due to the 
full-state absorption, we cannot anymore rely on the classical indifferentiability 
result of the Sponge (as was for instance done in [2,10]), and a new security 
analysis is required. We will elaborate on both directions in the following. 

Message Authentication. Bertoni et al. [9] introduced the keyed Sponge 
as a simple evaluation of the Sponge function on the key and the message, 
Sponge(A||M), and proved security beyond 0( 2 C / 2 ). Chang et al. considered 
a slight variant of the keyed Sponge where the key is processed in the inner 
part of the Sponge, and observed that it can be seen as the Sponge based on 
an Even-Mansour blockcipher. At FSE 2015, Andreeva, Daemen, Mennink and 
Van Assche [2] considered a generic and improved analysis of both the outer- 
and inner-keyed Sponge. So far, however, these constructions have only been 
considered with the classical r-bit absorption. 

The idea of using full-state message absorption for achieving higher efficiency 
was first made explicit in the Donkey Sponge MAC construction [ll], 1 but with- 
out any formal security proof. The recently introduced Donkey-inspired MAC 
function Chaskey [22] did get a formal security analysis, but its proof is thwarted 
towards Chaskey and does not apply to the Donkey Sponge. 

A thorough analysis of the full-state message absorption keyed Sponge had to 
wait for Gazi, Pietrzak and Tessaro [17], who prove nearly tight security up to 
0(£q(q + N)/2 b + q(q + £ + N)/ 2 C ), where the adversary makes q queries of 
maximal length £, and makes N primitive calls. However, their analysis only 
applies to the fixed- output-length variant, and the proof does not directly seem 

1 We note that apart from full-state absorption, the Donkey Sponge also uses less 

rounds in the underlying permutation during the absorbing phase. 
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to extend to the original arbitrary -output-length keyed Sponge. In this work, we 
provide a direct proof for this more general case. 

In more detail, we present a generalized scheme, dubbed Full-state Keyed 
Sponge ( FKS ), whose security implies the security of Donkey Sponge in the 
ideal permutation setting, and prove that it is secure up to approximately 
— b + ^nr, where k is the size of the key, and g is a parameter called 

the “multiplicity”. We note that usage of the outer- keyed Sponge makes no 
longer any difference from the usage of the inner-keyed variant in the presence 
of full-state absorption (see also Sect. 8). Our proof of FKS follows the modular 
approach of Andreeva et ah, but due to the full-state absorption, we cannot 
rely on the indifferentiability result of [6], and present a new and more detailed 
analysis. 

Authenticated Encryption. Encryption via the Sponge can be done (and is 
typically done) via the Duplex construction [10], a stateful construction con- 
sisting of an initialization interface and a duplexing interface. The initialization 
interface can be called to initialize an all-zero state; the duplexing interface 
absorbs a message of size < r bits and squeezes < r bits of the outer part. 
The security of the Duplex traces back to the indifferentiability of the classical 
Sponge, yielding a 0(2 C / 2 ) security bound. 

Bertoni et al. [10] showed that the Duplex, in turn, allows for authenticated 
encryption in the form of Sponge Wrap. This mode is, de facto, the basis of the 
majority of Sponge-based submissions to the CAESAR competition. Jovanovic 
et al. [18] re-investigated Sponge-based authenticated encryption schemes, star- 
ring NORX, and derived beyond birthday-bound security. These results are, how- 
ever, all for the usual r-bit absorption. Yasuda and Sasaki [27] have considered 
several full-state and partially full- state Sponge-based authenticated encryption 
schemes for efficient incorporation of associated data, directly lifting Jovanovic 
et al.’s security proofs. The concurrent absorption mode proposed by Yasuda and 
Sasaki (Fig. 3 in [27]) fails to utilize the full-state absorption when the associated 
data becomes longer than the message, forcing the mode switch from a full-state 
mode to the classical r-bit absorbing Sponge mode; hence, we refer to this as 
a partially full-state AE mode. Full-state data absorption was also proposed by 
Reyhanitabar, Vaudenay and Vizar [25] in their compression function based AE 
mode p-OMD. 

We generically aim to optimize the efficiency in Sponge-based authenticated 
encryption. To this end, we first formalize the Full-state Keyed Duplex (FKD) 
construction. It differs from the original Duplex in the fact that (i) the key is 
explicitly used to initialize the state (In this, the FKD is similar to the Monkey 
Duplex [11]) and (ii) the absorption is performed on the entire state. Note that 
the possibility to absorb in the entire state enforces the explicit usage of the 
key. Next, we prove that FKD is provably secure, i.e., indistinguishable from a 
random oracle with the same interfaces. As before, we cannot rely on the classical 
indifferentiability proof due to the full-state absorption; however, we show how 
to adapt the FKS proof to a special case directly related to the security of FKD. 
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We exemplify the better absorption capabilities of FKD by the introduction 
of a Full-state SpongeWrap (FSW). The FSW construction is more general than 
that of Yasuda and Sasaki, who only considered specific AE constructions, and 
interestingly, our approach also yields a more efficient (truly full-state) authenti- 
cated encryption mode irrespective of the relative lengths of messages and their 
associated data. 

Organization of the Paper. Notations and preliminary concepts are presented 
in Sect. 2. We present the Full- state Keyed Sponge and Full-state Keyed Duplex 
in Sect. 3. The security model is discussed in Sect. 4. In Sect. 5 we prove security 
of FKS and in Sect. 6 of FKD. The introduction of the Full-state SpongeWrap, 
and the application of FKD to this construction is given in Sect. 7. Section 8 
provides a brief discussion on related-key security and our security models. 

2 Notations and Conventions 

The set of all strings of length 6 is denoted as {0, l} b for any b > 1 and the set 
of all finite strings of arbitrary length is denoted as {0, 1}*. We will denote the 
empty string of length 0 as e. For any positive 6, we let {0, l} <b = Ui=o{0> 1} Z 
denote set of all strings of length less than b including e. For two strings 1,7 G 
{0, 1}* we let X || Y denote the string obtained by concatenation of X and Y. 
For a string X G {0, 1} X we let lefD (X) denote the i leftmost bits of X and 
right r ( X ) the r rightmost bits of X such that X = left x ( X ) || right x _ x ( X ) for 
any 0 < y < x. For integral 6, r, c such that b = r + c, and for t G {0, l} 6 , we let 
outer (t) = left r (t) and inner (t) = right c (£). 

$ 

For a non-empty finite set S let a <— S denote sampling an element a from 
S uniformly at random. We let \Z\ denote the cardinality if Z is a set and the 
length if Z is a string. We let Perm ( b ) denote the set of all permutations of 6-bit 
strings and Func (6) the set of all functions over 6-bit strings. 

Given two strings A, Y, let 

Hcp 6 (X,Y) = max {i : left* (X) = left* (Y)} 

i>0 

denote the length of the longest common prefix between X and Y in 6-bit blocks. 
For a string X and a non-empty set of strings {Yi, . . . , Y n } let 

IlcPb (x- Y lt . . . , Y n ) = max {llcp 6 (X, Y 1 ) , . . . , llcp 6 (X, Y n )} . 

For any two pairs of integers (i, j), {if , j'), we say that < (i,j) if either 

i' < i or if i' = i and j' < j. We say that ( i f ,j ' ) < (i,j) if (i / ,j / ) < (i,j) or 
if In other words, we use lexicographical ordering to determine 

ordering of integer-tuples. 

3 Sponge Constructions 

3.1 Full-State Keyed Sponge 

We consider the Full-state Keyed Sponge (FKS) construction that is using a 
public permutation p : {0, l} b — > {0, l} b . It is furthermore parameterized with 


Security of Full- State Keyed Sponge and Duplex 469 

r, k , which are required to satisfy r < b and k < b — r =: c. The parametrization 
is sometimes left implicit if it is clear from the context. FKS gets as input a key 
K £ {0, l} fc , a message M £ {0, 1}*, and a natural number z, and it outputs a 
string Z £ {0, 1} Z : 


FKS P (K, M, z) = FKS^(M, z) = Z . 

It operates on a state t £ {0, l} 6 , which is initialized using the key K. The 
message M is first padded to a length a multiple of b bits, using pad 6 (M) = 
M||10 6-1- l M l mod6 , which is then viewed as m 6-bit message blocks M 1 ||...||M m . 2 * 
These message blocks are processed one-by-one, interleaved with evaluations of p. 
After the absorption of M, the outer r bits of the state are output and the state 
is processed via p until a sufficient amount of output bits are obtained. FKS is 
depicted in Fig. 1, and Algorithm 1 provides a formal specification of FKS. 


M 

Ipadfc 
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M 1 




“T 

M 2 




1 

M ri 




Z 

_ 1 _ 
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~T~ 

Z 1 


— f — 

Z\ z M~i z^ z ! r 1 


Fig. 1. The FKS construction. 


Algorithm 1 . FKS[p, r, k\(K, M, z) 

Algorithm 2 . FKD[p, r, k\ 

1 

t <- 0 b ~ k II K 

1: 

: Interface FKD.initialize(iF) 

2 

M 1 || • • • || M m pad b {M) 

2: 

: t <- 0 6 - fc || K 

3 

for i = 1 , ,m do 



4 


1 

Interface FKD. duplexing (M, z) 

5 

t <- p(s) 

2 

if z > r or M > b then 

6 

Z leftr (t) 

3 

return T 

7 

while \Z\ < z do 

4 

s t ® pad b (M) 

8 

t <- p{t) 

5 

t e- p(s) 

9 

Z ^ Z II leftr (t) 

6 

return left 2 (t) 

10 

: return left z (Z) 




2 In fact, any injective padding function works, as long as the last block is always 


non-zero. 
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Fig. 2. The FKD construction. 

3.2 Full-State Keyed Duplex 

We present the Full-state Keyed Duplex (FKD) construction, a generalization 
of the Duplex of Bertoni et al. [8,10]. FKD is also parameterized by a public 
permutation p : {0, l} 6 — ► {0, l} 6 and values r, fc, which are required to satisfy 
r < b and k < b — r =: c. Again, the parametrization is sometimes left implicit 
if clear from the context. An instance of FKD, denoted by D, consists of two 
interfaces: D. initialize and D. duplexing. D. initialize gets as input a key K E 
{0, l} k and outputs nothing, while D. duplexing gets as input a message M E 
{0, l} <b and a natural number z < r, and it outputs a string Z E {0, 1} Z . FKD 
is depicted in Fig. 2, and the formal specification is given in Algorithm 2. FKD 
is a generalization of FKS where D. initialize is used to initialize the state, and 
messages are absorbed into the state and/or digests are squeezed out of the state 
using D. duplexing calls. 

4 Security Models and Tools 

Multiplicity. Let {(#*, 2/i)}f =1 be a set of a evaluations of a permutation p. 
Following Andreeva et al. [2], we define the total maximal multiplicity as p = 
M fwd T A^bwd? where 

/if w d = max |{i G {1, . . . , a} : outer (xi) = a}|, 

a 

Mbwd = max |{i G {1, . . . , <r} : outer (j/j) = a}|. 

a 

The multiplicity is a quantity that characterises the data that are available 
to the adversary during the attack. We have 2 < p < 2a per definition, however 
the upper bound 2 a is never reached in practical applications of sponge-based 
constructions. Being a sum of forward and backward multiplicities, the total 
multiplicity can be seen as a measure of adversary’s ability to control the outer 
part of the permutation inputs and outputs respectively. In case of sponge-based 
designs, the backward multiplicity can be expected to be approximately cr2 -r 
while the forward multiplicity varies with concrete applications [2]. 
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4.1 Adversaries and Patarin’s Coefficient-H Technique 

We consider an information-theoretic adversary A that has access to one or 
more oracles X; this is denoted by A x and the notation A x => 1 means that 
A, after interaction with X , returns 1. It is a classical fact (for a simple proof 
see [14]) that in the information-theoretic setting, adversaries can be assumed 
to be deterministic without loss of generality. 

We use Patarin’s Coefficient-H technique [23]; more precisely, a revisited 
formulation of it by Chen and Steinberger [14]. Consider a deterministic 
information-theoretic adversary A whose goal is to distinguish two oracles X 
and Y: 


A a (X;Y) 


Pr 




-Pr 




Here, X and Y are randomized algorithms; the randomization depends on the 
specific scenario and for now is left implicit. The interaction with any of the two 
systems X or Y is summarized in a transcript r. Denote by Dx the probability 
distribution of transcripts when interacting with X, and similarly, Dy the distri- 
bution of transcripts when interacting with Y. A transcript r is called attainable 
if Pr [Dy = r] >0, meaning that it can occur during interaction with Y. Denote 
by T the set of all attainable transcripts. The Coefficient-H technique states the 
following, for the proof of which we refer to [14]. 


Lemma 1 (Coefficient-H Technique [14,23]). Consider a fixed determinis- 
tic adversary A. Let T = T gooc j U7b a d be a partition into good transcripts T goo & 
and bad transcripts Tbad- If there exists an e such that for all r G T goo d , 


Pr [D x = r] 
Pr [D y = r] 


>l-e, 


then , A A (X; Y) < e + Pr [Dy G % ad \. 


The two partitions of T are labeled as T goo & and 7b a d to aid the intuitiveness 
of the proof. The transcripts in T gooc * are “good” in the sense that they give 
us a high value of Pr [Dx = r]/Pr [Dy = r] and thus small 5 while the “bad” 
transcripts from 7b a d fail to do so. 


4.2 Security Models for FKS and FKD 

Let RO°° : {0, 1}* — ► {0, 1}°° be a random oracle which takes inputs of arbitrary 
but finite length and returns random infinite strings, where each output bit is 
selected uniformly and independently for every input M. 

Let F be either FKS or FKD, which is based on a permutation p : {0, l} b — > 
{0, l} b and a key K G {0, l} k . We will define the security of F in two settings: the 
public permutation setting, where the adversary has query access to the permu- 
tation (security comes from the secrecy of A), and the secret permutation setting 
(with no explicit key iF), where the adversary has no access to the underlying 
permutation and the security comes from the secrecy of the permutation. 
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We use the notations F ^ and Ffi to refer to the public permutation and 
secret permutation based schemes, respectively; where, tt is a secret random 
permutation. 

In both settings, we consider an adversary that aims to distinguish the real F 
from an ideal (reference) primitive — an oracle RO with the same interface. For 
F = FKS the corresponding ideal primitive RO is defined by ROfks{M, z) = 
left^ (RO°° (M)). For F = FKD the corresponding reference primitive RO FKD 
is a stateful oracle with two interfaces: (1) RO p KD .initialize() that initializes 
the state of the oracle, St, to the empty string, and (2) RO FKB . duplexing (M, z) 
that, on input M E {0, 1} <6 and a natural number £, first updates the state as 
St <— St 1 1 padb(M) and then outputs \eft z (RO°° (St)). 

We define the distinguishing advantage of any adversary A against F based 
on a public permutation by 


Adv 


ind 


(A)= Pr \k 4 {0, l} k ,p <— Perm (b) : A F ^' P ' P ~ 


Pr 


p 4- Perm (b) : A RO ' p ' p ~ 


The distinguishing advantage of A against F based on a secret permutation is 
defined by 


Adv^(A) 


Pr 


Perm ( b ) : A F ° => 1 


Pr [A^° =* 1] 


The resource parameterized advantage functions are defined as usual. Let 
Ad v'pi (q,£,/jb,N) = max^ Adv^f? (A) be the maximum advantage over all 
adversaries that make q queries to the left oracle, all of maximal length £ per- 
mutation calls if F = FKS or that make at most q initialize() calls to the left 
oracle and issue at most £ duplexing queries after each initialization if F = FKD 
with total maximal multiplicity /a in both cases, and that make N direct queries 
to the public permutation. To simplify the analysis, we assume that each of the 
q oracle queries in fact consists of exactly £ permutation (or that the adversary 
indeed makes £ duplexing calls after each initialization). This is without loss 
of generality, it can simply be achieved by giving extra squeezing outputs to 
the adversary. Similarly, we define Adv^(g, £, /a) = max^ Adv^(A), noticing 
that in this case N = 0, thus it is omitted from the resources. 


4.3 Security Model for Even-Mansour 

Our proof relies on a reduction to the security of a low-entropy single-key 
Even-Mansour construction [15,16]. In more detail, let p : {0, l} 5 — ► {0, l} 6 
be a permutation and K E {0, l} k be a key. The Even-Mansour blockcipher is 
defined as 


E p k (M) = p(M © (O 6 ^ || K)) © (0 b ~ k || K). 
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We define the distinguishing advantage of any adversary A against E based on 
a public permutation p as 




Pr 


AT 4 {0,l} fc ,p4 Perm(&) : A E ^’ P ’ P ~ 


Pr 


7r, p < — Perm (6) : A 71 


Let AdVp P p (g, fi, N) = max^ AdVp P p (A) be the maximum advantage over all 
adversaries that make q queries to the left oracle, with total maximal multiplicity 
/i, and that make N direct queries to the public permutation. 


5 Security Analysis of FKS 


We prove the following result for FKS: 

Theorem 1. Let 6, r, c, k > 0 be such that b = r + c and k < c. Let FKS be the 
scheme of Sect. 3.1. Then, 


Adv 


ind 

FKS P ,p 




2(gf 2qU uN 
2 b + 2 C + 2 k ' 


The proof follows to a certain extent the modular approach of [2], and in par- 
ticular also uses the observation that FKSj t - can alternatively be considered as 
E P 

FKS 0 K , a clever observation used before by Chang et al. [13]. Note that this 
observation only works for k < c: it consists of xoring two dummy keys K 0 K 
in-between every two adjacent permutation calls, and if k > c this would entail a 
difference in the squeezing blocks of FKS. This trick splits the security of FKS^ 
into the security of the Even-Mansour blockcipher and the security of FKS with 
secret primitive. Looking back at [2], the security of Inner-keyed Sponge/ Outer- 
keyed Sponge [2] with secret permutations was simply reverted to the classical 
indifferentiability result of [6] . Because this is a rather loose approach, and addi- 
tionally because the indifferentiability bound cannot be used for FKS due to its 
full-state absorption, we consider the security of FKS with secret primitive in 
more detail and derive an improved bound. 

Proof (Proof of Theorem 1). Consider any adversary A with resources (g,£, 
p,N). Note that FKS^ = FKS 0 K . Therefore, by a modular argument, 


Advj^sP^A) = Aa (FKsf«,p; RO FKS ,p) 

< A b (FKS ROfks,p) + A c ( E p K ,p ; tt ,p) 


= Adv^ ss (B) + Adv p E r l p (C) 


for some adversary B with resources (q,£,/a) and adversary C with resources 
(q£, i , N). Note that B also has access to p, but queries to this oracle are mean- 
ingless as its left oracle (FKSq or RO fks) is independent of p. 

In [2], it is proven that Adv pr P p (C) < ^ for any C. In Lemma 2, we prove 

that AdvpKgj(-B) < ^ for any adversary B. □ 
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Lemma 2. Let 6, r, c > 0 be such that b = r + c. Let FKS be the scheme of 
Sect. 3.1. Then, 


Ad v p^s < 


2 (qt) 2 
2 b 


2 q 2 £ 
~2L r ' 


Proof. Given that the padding is publicly known and injective, we can generalize 
the setting, and assume that the i th query Mi has length divisible by b and that 
M™i o 6 , i.e. we assume that all the queries are already padded. More detailed, 
for 1 < i < q, we let mi = \Mi\/b and Mi = M\ || M 2 || ... || s.t. \Mi | = b 
for 1 < j < mi. We further assume, that the adversary always asks for output of 
length divisible by r and that every query induces exactly £ primitive calls. This 
is without loss of generality: we can simply output “free bits” to the adversary. 
We will denote the 6-bit state of FKS just before the j th application of i r is made 
when processing the i th query as sj for 1 < j < £. Similarly, we will denote 
the 6-bit state of FKS just after the j th application of i r in i th query as tj for 
1 < j < £. We will call the former in- states and the latter out-states. Note that 
every in- state sj is determined by the out-state tj~ X and the block of query M{ 
as si = t{ ' © Ml in the absorbing phase or just by tj in the squeezing phase 
as depicted in Fig. 3. 

To aid the simplicity of further analysis we additionally define initial dummy 
out-states t ? = 0 6 and extended queries Mi = Mi || for 1 < i < q. Now 

we can express every in-state, be it absorbing or squeezing, as sj = tj~ x ® Ml. 
We will group the out-states of i th query as Ti — {t®, tj , . . . , tj}. Because each 
query induces exactly £ calls to 7 r, we know that a query Mi will be answered 
by a string Zi = Z\ |[ ... || Zff with Zi = £ — nn P 1 and | Z\ \ = r for 1 < j < Zi. 

In particular, we have that Z\ = outer -1 ^. 

$ 

The RP-RF Switch. We start by replacing the random permutation tt <— 
Perm (6) by a random function / Func (6) in the experiment. This will con- 
tribute the term (q£) 2 /2 b to the final bound by a standard hybrid argument so 
we have Advp^(q,£, n) < Ad + ( q£) 2 /2 b . 


M} M? 


j 


yl—rrii + l 


-e- 


-e- 


-®r 


nii — 1 


e- 


rrii + l 


£+1 


Fig. 3. Processing the z th query. 
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Patarin’s Coefficient-H Technique. We will use the coefhcient-H technique 
to show that Adv^ s / (g, /i) < (qtj 2 /2 b -\-2q 2 £/2 c . The two systems an adver- 

sary is trying to distinguish are FKSq and ^Ofks- We will refer to the former as 
X and to the latter as Y. In either of the worlds, the adversary makes q queries 
Mi, . . . , M q and learns the responses . . . , Z q . The transition from queries Mi 
to Mi is injective, and additionally the length rrq of Mi is implicit from Mi. 
Therefore, we can summarize the interaction of the adversary with its oracle ( X 
or Y) with a transcript (Mi, . . . , M qi Zi, . . . , Z q ). 

To facilitate the analysis, we will disclose additional information Ti, . . . ,T g 
to the adversary at the end of the experiment. In the real world, these are the 
out-states Ti = {t®, tj , . . . , t\} as discussed in the beginning of the proof. In 
the ideal world, these are dummy variables that satisfy the following intrinsic 
properties of the Sponge construction: 

1. t® = 0 b for 1 < i < g, 

2. if \\cp b ( Mi , Mif) = n for 1 < i, i' < q then t{ = tj, for 1 < j < n, 

3. outer = Z( for 1 < i < q and 1 < j < Zi , 

but are perfectly random otherwise. Note that in both worlds, Zi, . . . , Z q are fully 
determined by Ti, . . . , T 9 , so we can drop them from the transcript. Thus a tran- 
script of adversary’s interaction with FKS will be r = (Mi, . . . , M g , Xi, . . . , T q ). 

With respect to Lemma 1, we will show that there exists a definition of bad 
transcripts TbacP such that Pr [Dx = r] / Pr [Dy — r] = 1 for any r G T g OQ ^ = 
T\T bad , and thus Adv^Vf (q, t , fi) < Pr [Dy € T Vmd ] . 

Definition of a Bad Transcript. Stated formally, a transcript r is labeled as 
bad if 

3(1,1) < < {qj) such that: 

j + j' V llcp & (Mi, Mi') <j= j' < i, (1) 

iPr 1 ® Mf ®M{, . 

This formalization of a bad transcript comes with an intuitive, informal inter- 
pretation; as long as all relevant inputs sj = t\~ x 0 M/ to the random function 
/ induced by the Sponge function are distinct the output of the Sponge will 
be distributed uniformly. We do not require uniqueness of all in-states because 
the adversary can trivially force their repetition by issuing queries with common 
prefixes, as we have argued earlier. However these collisions are not a problem 
because uniqueness of the queries implies that llcp 6 (M*,M*/) < maxjm^mi/} 
for any two queries M^,M^. Even if the adversary truncates an old query and 
thus forces an old absorbing in- state s to be squeezed for output, it is still not 
a problem because the adversary has not seen the image f(s) before. Note that 
albeit in-states do not exist in the ideal world, they can be defined by the same 
relation as in the real world, i.e. s{ = t\~ x 0 M/. 

Bounding the Ratio of Probabilities of Good Transcripts. In the ideal 
world, the out-states are always assigned a value trivially. Beside that, 
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we will also trivially assign a single randomly sampled value to multiple state 
variables, that are affected by the common prefixes of the queries. The remaining 
out-states are sampled uniformly at random. It follows that there are exactly 
77 (r) = Yli= 1 ^ — II cp b (Mf, Mi, . . . , 1) 6-bit values in any transcript r, that 

are sampled independently and uniformly. We thus have Pr [Dy = r] = 2 ~ r] ^ b 
for any r. 

Let Qx be the set of all possible real-world oracles. We have that \£ 2 x | = 2 62 . 
Let comp x (r) C Q x be the set of all oracles compatible with the transcript 
r, i.e. the set of the real-world oracles that are capable of producing r in an 
experiment. We will compute the probability of seeing r in the real world as 
Pr [Dx = t] = |comp x (r) |/|i?x|- Note that a real-world oracle is completely 
determined by the underlying function /. 

If r E T g 00 d, then every in-state sj = tj~ X ® M? that does not trivially 
collide with some other in-state sj, due to common prefix of M\ and M?, must 
be distinct. The number of domain points of / that have an image assigned by 
r is easily seen to be rj(r) = Yli = 1 ^ — H C P b (Mi] Mi, . . . , M^_i). A compatible 
function / can therefore have arbitrary image values on the remaining 2 b — 77 (r) 

domain points. Thus we compute |comp x (r) | = 2 6 ( 2 ~ r] ^) and 


Pr [D x = r] 


l CQm Px (t) I 

\&x\ 


2 b ( 2b - r n( r )) 

2 ^ 


= 2 ~^ r)b = Pr [D Y = r] . 


Bounding the Probability of a Bad Transcript in the Ideal World. We 

can bound the probability of r being bad (cf. (1)) by first bounding the collision 
probability of an arbitrary but fixed pair of in-states sj , sj, (i.e. the event sj = sj, 
occurs) and then summing this probability for all possible values of (i, j), (i',j') 
with (i',j') 7^ Because this probability varies significantly, we will split 

all in-states into three classes and bound probabilities of individual collisions 
between these classes. 

We will associate to each in-state sj a label stamp]. We set stamp] = free 
if 1 < j = llcpfc (Mi] Mi, . . . , Mi- 1) + 1 < rrii such that rrii * < j for some 
i* < i. We will set stamp] = initial for 1 < i < q and stamp] = fixed in 
the remaining cases. Informally, we have stamp] = free whenever the adver- 
sary forces outer (^] _1 ) = by reusing exactly first j — 1 blocks of a 

previous query Mi* in Mi and sets M] 7^ M]* = 0 5 . By doing this, it freely 
but non-trivially chooses outer ^sj^ = outer (sj* ® Mj* ® M]^ . Note that if the 

adversary puts M] = M]* , this is not counted as a free state (the states will in 
fact be the same). We have stamp] = initial for the initial in-state of every 
query. 

As the condition (1) is symmetrical w.r.t. (i,j) and (i\j f ), and as it cannot 
be satisfied if (i,j) = (i r ,j f ), it can be rephrased as 

3(1,1) < (i',j') < (i,j) < (q,() such that: 

llcp h (Mi; Ml, , Mj_ 1) < j < £, sj = sj, . 


( 2 ) 
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Doing so is without loss of generality, as each sj with j < llcp 6 (Mq Mi, . . . , M^_i) 
is identical with some previous state that has already been checked for collisions 
with sj, for every possible (i' ,j')- In the further analysis, we will be working with 
(2) rather than with (1). 

We will now bound the probability of collision of an arbitrary pair of in- 
states (sj, sj , ) = (tj- 1 0 M/ ,tj , -1 ® M?, ) with stamp] = fixed. We fix arbitrary 
i and investigate the following three cases for j. In each case we treat every 
(*',/) < 

Case 1: llcp b (M*; Mi, . . . , Mi_i) + 1 < j < rrii. In this case, tj 1 is 
undetermined when the adversary issues the query Mp This implies that it 
will be independent from all tj, ~ 1 for any ( i',j ' ) < ( i,j ). The probability of 
the collision tj -1 ® M\ = tj , _1 ® Mf, is easily seen to be 2~ b . 

Case 2: max {llcp b (M^; Mp . . . , Mi- 1 ) + 1, rrii] < j < i. Here tj = 

Z] || inner and M\ = 0 6 . Although the adversary learns the 

value of Z] during the experiment, this is independent of all sj, with 
(i',/) < (i,j) (because j + 1 > I lcp 6 (Mf, M \, . . . , Mj_i)). Even if stamp], E 
{free, initial} and outer (^sj, = a for some value a chosen by the adver- 
sary, the collision Zj~ rni || inner = a || inner happens with prob- 

ability 2~ b . 

Case 3: j = llcp b (M*; Mp . . . , Mi- 1 ) + 1. If j = llcp 6 (Mp Mp) + 1, the 

in-state sj ,~ 3 , call it a twin-state of 5 ], cannot collide with sj, as by the 
second trivial property tj -1 = tjZ 1 and by j — 1 = llcp 6 (MpMp) we have 
M 3 ^ M 3 , . Note that if there was an i* < i with rrii* < llcp 6 (Mi, Mi * ) = j — 1 
and j < rrii then we would have stamp] = free. However if we had the 
same situation but with j > rrii then Mi and Mi* would be identical. So 
outer has not been set and revealed to the adversary by any previous 

output value and for any non-twin, in-state sj, , the probability of collision 
is at most 2~ b by a similar argument as in Case 1. 

There are no more than q£ choices for (i,j) and no more than q£ possible (i',j') 
for every ( i,j ) so the overall probability that the condition (2) will be evaluated 
due to a pair of in-states with stamp] = fixed is at most (q£) 2 /2 b . 

If stamp] = free then outer ^s]^ is under adversary’s control. However the 

value of inner is always generated at the end of the experiment. By a 

case analysis similar to the previous one we can verify that the probability of a 
collision due to a pair of in-states with stamp] = free is not bigger than 2~ c . 
It is apparent from the definition of a free in-state that there is at most one 
such in-state for each query. Having ql in- states in total, there are at most q(q£) 
pairs with stamp] = free and the probability of r E 7b a d due to such a pair is 
at most q 2 £/2 c . 
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If stamp] = initial then sj cannot non-trivially collide with any other 
initial in-state. A collision with a non- initial state s], implies that t], _1 = 
Ml, ©M/.If j' > mi' or if there is some Mi* with rrii* < j' <= llcp 6 1, 

then outer (tj, is known to the adversary. However inner (tj, is always gen- 
erated at the end of the experiment. By a case analysis similar to the one we carried 
out earlier, it can be verified that the collision s] = s], occurs with probability no 
bigger than 2~ c . There is exactly one initial in-state in each query, so similarly 
as with the free in-states, the overall probability of a transcript being bad due 
to a pair with an initial in-state is at most g 2 £/ 2 C . By summing all the partial 
collision probabilities we obtain that Pr [Dy E %ad\ < (^) 2 /2 6 + 2g 2 £/2 c . □ 


6 Security Analysis of FKD 


For FKD, we prove the following result: 

Theorem 2. Let b, r, c, k > 0 be such that b = r + c and k < c. Let FKD be the 
scheme of Sect. 3.2. Then, 


Ad v FKD^,p((h ^5 /b N) < 


2 h 


m 2 tn 
2 C + 2 k ’ 


The proof uses Lemma 3 to transform a FKD adversary into an FKS adver- 
sary, similarly to [8,10]. While this would be sufficient to prove the secu- 
rity of the Duplex construction, the bound induced solely by Lemma 3 suf- 
fers from a quantitative degradation: we have that Advp^ D ^ p (q, i, /a, N) < 

Advp^ S P p (q£,£, p, N), resulting in a bound — b — b according to 

Theorem 1. In reality, there will be a quantitative gap between the security of 
FKD construction and that of FKS present, but it will be smaller. This is because 
an FKS adversary constructed from an FKD adversary issues queries of a spe- 
cific structure which is far from general. In below proof for FKD, we use this 
property. In more detail, we derive a specific class of “constrained adversaries” 
and generalize the proof of Lemma 2 to these adversaries. 

Proof (Proof of Theorem2). Consider any adversary A with resources (g,£, 
/i, TV). We have that FKD^ = FKD 0 K . Therefore, by a modular argument, 


Adv^, |P (A) = A a (FKD RO FKD ,p ) 


< Ab (FKDq ,p; i?0 FKD ,p) + Ac (E^,p; n ,p) 

< Advp^ D ^(.B) -b Ad v ^p P p (C) 


for some adversary B with resources p) and adversary C with resources 
(q : £ : p, TV). Note that B also has access to p, but these queries are meaningless 
as its left oracle (FKDq or RO FKD ) is independent of p. 
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In [2], it is proven that Adv^p P (C) < /xN/2 k . In Corollory3 we show that 
any FKD adversary B can be turned into a special “constrained” adversary B' 
against FKS with resources {q£,£,fi)\ 

AdvpK D 7r(i?) < Advp^s %(B f ). 

In Lemma4, we prove that Advp^ s ^(,B / ) < (q£) 2 / 2 b + ( q£) 2 /2 c for any such 
adversary B ; . □ 

For the remainder of the proof, we introduce the mapping Qfks : ({0, 1} <£> ) _I_ — > 
{0, 1}*. For any b > 0 and for all Xi, . . . , X n G {0, 1} <6 we let 

Q F Ks(X 1 ,...,A n )=pad 6 (X 1 )|| ... ||pad 6 (X n _ 1 )||X n . 

Lemma 3 (Duplexing Lemma [10]). Let b,r, c, k > 0 be such that b = r + c 
and k < c. Let D = FKD P as defined in Sect. 3.2. Then for the i th duplexing 
query {M^zi) made after the last D. initialize (K) we have 

Zi = D. duplexing (M<, z>) = FKS P (K, Q FK s(M 1 , . . . , Af<), z { ). 

Moreover, the mapping Qfks • ({0, 1} <6 ) + — ► {0, 1}* is injective. 

The proof of the lemma uses similar arguments as that of Bertoni et al. [10]. 
A complete proof can be found in the full version of this paper [21]. 

The result of Lemma 3 can be used to reduce any FKD adversary to a con- 
strained FKS adversary. More specifically, any adversary A against FKD that 
makes q initialize calls and duplexes £ blocks after each initialization can be 
reduced to a constrained FKS adversary A' = Rfks(A). To answer the j th 
duplexing query (M/ , z\ ) made by A after the i th initialize call, A' queries its 
own oracle with (Qfks (M/, . . . , M-), z\). A' copies the output of A at the end 
of the experiment. 

Corollary 3. Let A be an adversary against FKD that makes q initialize calls 
and duplexes i blocks after each initialization and R F ks{A) the constrained 
FKS adversary as defined above. It follows from Lemma 3, that Advp^ D7r (A) < 

AcIvfks £ (Ufks ) • 

We denote by A' q ^ the set of constrained adversaries against FKS, that were 
induced by some FKD adversary that makes q initialize calls and duplexes £ 
blocks after each initialization: 


A' q ^ = {-^fks (A) : A an FKD adversary with resources (q,£)}. 

Lemma 4. Let b, r, c > 0 be such that b = r + c. Let FKS be the scheme of 
Sect. 3.1. Then, 


Advp^sj (A*) < M- 


+ 


for any constrained adversary A' G A' e . 


M! 
2 C ’ 
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The proof follows to large extent the framework of the proof of Lemma 2. We 
show in particular, that although the constrained adversary makes q£ queries, 
each query induces only a single free or initial state; the remaining internal 
in- states, if any, are always identical to the in- states of a previous query and 
they thus do not contribute to the probability of observing a bad transcript. 
This gives us at most q£ free or initial in-states and the bound follows. 
A complete proof can be found in the full version of this paper [21]. 

7 Full-State Sponge Wrap and its Security 

Our results from Sect. 6 can be used to prove security of modified, more effi- 
cient versions of existing Sponge-based AE schemes. As an interesting instance, 
we introduce Full-state Sponge Wrap, a variant of the authenticated encryption 
mode Sponge Wrap [8, 10], offering improved efficiency with respect to processing 
of associated data (AD). 

7.1 Authenticated Encryption for Sequences of Messages 

We will focus on authenticated encryption schemes that act on sequences of AD- 
message pairs. Following Bertoni et al. 3 [8,10]we will think of an authenticated 
encryption scheme as an object W surfacing three APIs: 

- W. initialize (A, N): calling this function will initialize W with a secret key 
from the set of keys JC and a nonce from the set of nonces Af . 

- IT.wrap(A, M): this function inputs an AD-message pair (A, M) and outputs 
a ciphertext-tag pair (C, T), where \C\ = \M\ and T is a r-bit tag authen- 
ticating (A, M) and all the queries processed by W so far (i.e. since the last 
initialization call). 

- W.unwrap(A, C, T): this function accepts a triple of AD, ciphertext and tag, 
and outputs a message M if C is an encryption of M and T is a valid tag 
for (A, M), and all the previous queries processed by W so far; otherwise it 
outputs an error symbol _L. 

Here, the AD, messages and ciphertexts are finite strings and we have \C\ = |M|. 
r is a positive integer and we call it the expansion of W . We require that W 
is initialized before making the first wrapping or unwrapping call. For a given 
key A, we will use Wk to refer to the corresponding keyed instance, omitting 
K from the list of inputs; that is, W. initialize (K, N) = Wk- initialize (N). 

Security of Authenticated Encryption. We follow Bertoni et al. [8, 10] for 

defining the security of AE. We split the twofold security goal of AE into two 
separate requirements: privacy and authenticity. 

Let IT be a scheme for authenticated encryption, as described above, that 
internally makes calls to a public random permutation p. We formalize the pri- 
vacy of IT by an experiment in which an adversary A is given access to p,p _1 

3 Bertoni et al. do not consider an explicit nonce as we do; they rather require the 
header of the first wrapping call to be unique. 
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and an oracle 0 that provides two interfaces: O. initialize (TV) and 0.wrap(A, M). 
We have O G {Wk, ROw}, where Wk is an instance of the real scheme with 
the key iV, and ROw is an ideal primitive that acts as follows: it keeps a list of 
strings St G ({0, 1}*) as its internal state. On calling ROw- initialize (TV) the list 
St is set to the empty list and then the nonce TV is added to the list (denote this 
operation by St <— ST||TV); now each call i?(9w.wrap(A, M) will first update the 
list as St <— St\\(A,M) and then will output left| M | +r (RO°° ((St ))) , where (St) 
denotes an injective encoding of the list St into a string in {0, 1}*. (Note that 
the list St preserves the boundaries between TV and all the queried AD-message 
pairs). 

The adversary must distinguish between the two worlds: the real world where 
it is interacting with Wk and the ideal world where it is interacting with ROw - 
The advantage of the adversary in doing so is defined as 

Adv^j(A) = Pr \k 4- JC : A Wk ’ p ’ p ~ 1 => ll - Pr \a RO w ’ p ’ p ~ 1 => ll . 

It is assumed that the adversary meets the nonce-requirement , i.e. that every 
initialize() it makes is done with a fresh nonce. 

For the definition of authenticity property, consider an experiment where 
an adversary A is given access to the oracle Wk and is allowed to ask 
the queries Wk- initialize (TV) and IFA:-wrap(A, M). It is assumed that A 
respects the nonce-requirement in the wrapping queries. A is again allowed 
to query p. The adversary can also attempt forgeries at any time during 
the experiment; we say that the adversary forges if it outputs a sequence 
(AT, (Ai, Ci, Ti), . . . , (A n , C n , T n )) such that after calling W. initialize (K, TV ) 
and then kF.unwrap(A^, C*, Ti) for 1 < i < n — 1, IT.unwrap(A n , C n , T n ) 
does not return _L. The sequence (TV, (Ai, Ci, Ti), . . . , (A n , C n , T n )) must be 
such that the adversary has not obtained (C n ,T n ) from a wrapping query 
that followed an initialization with N and a series of wrapping queries 
(Ai, Mi), . . . , (A n , M n ) with some Mi, . . . , M n . The adversary does not have to 
use a unique nonce in the forgery. Note that it can be assumed w.l.o.g. that every 
forgery attempt is either a fresh nonce followed by a single AD-ciphertext-tag 
triplet or of the form (TV, (Ai, Ci, Ti), . . . , (A n , C n , T n )) with (TV, (Ai, Ci, T\), 

. . . , (A n _i, C n _i, T n _i)) being learned by the adversary from a sequence of pre- 
vious wrapping queries. We define the advantage of A as 

Adv^(A) = Pr \k£-K : A Wk ’ p ’ p ~ 1 forges . 

We let Adv^ 1 ^ (g v , g, £, //, TV) = max^ Adv^) 1 ^ (A) be the maximum advantage 
over all adversaries that make q initialize queries to the left oracle, and after 
each initialization do wrapping queries that induce at most £ permutation calls 
(including the initialization) and with total maximal multiplicity //, and that 
make TV direct queries to the public permutation, and that make at most q v 
forgery attempts. We similarly let Adv^|^ (g, £, /i, TV) = max^ Adv^|^ (A). 
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Algorithm 3. Outline of an FSW[p, r, fc, n, r] wrap /unwrap (A, M) query 
1: while there are both AD and message bits to process do 
2: take < r bit block of M and < c — 5 bit block of A 

3: wrap/unwrap the message block 

4: if both A and M end then 

5: produce tag using frame bits Fam 

6: else if only A ends or only M ends then 

7: process the blocks using frame bits Fam\ 

8: else 

9: process the blocks using frame bits Fam 

10: while there are message bits to process do 
11: take < r bit block of M 

12: wrap/unwrap the message block 

13: if M ends then 

14: produce tag using frame bits Fm 

15: else 

16: process the blocks using frame bits Fm 

17: while there are AD bits to process do 

18: take < r + c — 5 bit block of A, split it into r bit and c — 5 bit parts 

19: if A ends then 

20: produce tag using frame bits Fa 

21: else 

22: process the parts using frame bits Fa 

23: prepare r random bits for next query using frame bits Fn 


7.2 Full-State Sponge Wrap 

The Full-State SpongeWrap (FSW) is a permutation mode for authenticated 
encryption of AD-message sequences as described in Sect. 7.1. It is parametrized 
by a 5-bit permutation p, the maximal message block size r, the key size &, the 
nonce size n, and the tag size r > 0. We require that k < b — r =: c and n < r. 
The set of keys is JC = {0, l} k and the set of nonces is J\f = {0, l} n . The FSW 
construction uses an instance of FKD internally to process the inputs block by 
block. To ensure domain separation of different stages of processing a query, we 
use three frame bits placed at the same position in each duplexing call to FKD 
as explained in Table 1 . 

The main motivation of the FSW is concurrent absorption of message and AD 
to achieve maximal efficiency in terms of minimizing the number of permutation 
calls made. Since we can only process r bits of a message input at a time, we can 
use the remainder of the state for the frame bits and a block of AD. This implies 
the lengths of message and AD blocks processed with each permutation call; 
r + 1 bits for padded message block, 3 frame bits and (having in mind that the 
input to FKD is always padded) this leaves us at most (b — 1) — (r + 1) — 3 = c— 5 
bits for a block of AD. To minimize the number of permutation calls made in 
all possible situations, we further specify special treatment for the wrap/unwrap 
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Table 1 . Labeling and usage of the frame bits within FSW. 


Label 

Value 

Usage 

F ’ N 

000 

process nonce, derive initial mask of a query 

Fam 

001 

block of A and M inside query 

Fm 

010 

block of M inside query 

Fa 

011 

block of A inside query 

Fam\ 

100 

last block of A and M inside query 

Fam 

101 

last block of A and M, query ends, produces tag 

Fm 

110 

last block of M, query ends, produces tag 

Fa 

111 

last block of A, query ends, produces tag 


queries with more AD blocks than message blocks. An informal outline of a 
wrap/unwrap query is given in Algorithm 3. This outline nicely illustrates how 
the frame bits are used for domain separation. 

We next give a complete algorithmic description of the FSW. To keep it com- 
pact, we introduce the following notations. For any L E {0, l}- r , R E {0, 1}- C-5 
and F E {0, l} 3 , we let 

Q(L, F, R) = pad r+1 (L) || F || R. (3) 

Note that r + 4 < | Q(L, F,R)\ <b—l for any L, F, R. We let (L, R) = lsplit(X, n) 
for any X E {0, 1}* such that L = left min (|x|,n) PO and right| X |_|L| (X). We let 
X\ || X 2 || ... || X m X denote partitioning a string X in such a way that 
X = X 1 \\X 2 \\ ... || X m , \Xi\ = r for 1 < i < m and 0 < \X m \ < r. Note 
that m = |~|X|/r~|. We will use the abbreviation D.dpx(M, z) for the interface 
D. duplexing (M, z) of an FKD D. The interfaces of FSW[p, r, n, r] are defined 
in Algorithm 4. A schematic depiction of how the wrap interface processes various 
types of inputs can be found in the full version of this paper [21]. 


7.3 Security of FSW 

The security of FSW is relatively easy to analyze, thanks to the result from 
Sect. 6. 

Lemma 5. Let W = FSW[p, r, fc, n, r] be an instance of FSW as described 
in Sect. 7.2. Denote any query to W. initialize and a list of subsequent queries 
to W. wrap by (TV, (Ai, Mi), . . . , (A n , M n )). Then , FSW injectively maps this 
sequence to a sequence of corresponding FKD duplexing queries (Qi, . . . , Qd)- 

We prove the injectivity of the mapping by showing how it can be inverted. 
Thanks to the way the frame bits are used (Fig. 4), it is possible to determine 
which duplexing calls belong to a single wrap query. More than that, we can also 
determine the boundaries of message and AD using the frame bits and then we 
can reconstruct them thanks to the use of the padding. The full proof can be 
found in the full version of this paper [21]. 
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Algorithm 4 . FSW[p, r, k , n, r] 

1 

Interface IF. initialize (A, N) 

1 

Interface VF.unwrap(A, C, T ) 

2 

D. initialize (K) 

2 

Cl || ... HOnAC 

3 

S <- pad r (iV) II 0 II F n || 0 C “ 5 

3 

(A ' , A*) <— lsplit(A, m(c — 5)) 

4 

Z F>.dpx(F, r) 

4 

AU\ ••• \\A! a ,^±A' 



5 

Al II ... II A* a ,<^A* 

1 

Interface VFwrap(A, M) 

6 

if m = a — a* = 0 then 

2 

Mi || ... || MmZ-M 

7 

r ir-e 

3 

( A 7 , A*) 4— IsplitfA, m(c — 5)) 

8 

F^ F a 

4 

Ai|| ... \\A' a ,^±A' 

9 

for z 1 to a 7 — 1 do 

5 

^ || ... \\az.<£±a* 

10 

M» <- Ci © Z 

6 

if m = a = a* — 0 then 

11 

Z F.dpx(Q(M*, Fam, A[),r) 

7 


12 

if 0 < a < m or 0 < a 7 , a* then 

8 

F <— Fa 

13 

M a , <— C a f © left|c a , | (Z) 

9 

for z 1 to a' — 1 do 

14 

Z <- F.dpx(Q(M a /, F am| , A' a ,),r) 

10 

Ci ^ Mi ®Z 

15 

else if 0 < m = a and a* = 0 then 

11 

Z <— D.dp^(Q(M i ,FAM,A' i ),r) 

16 

M a r <- c a , ©left|c a/ |_(Z) 

12 

if 0 < a < m or 0 < a 7 , a* then 

17 

T 7 <- F.dpx(Q(M a /,F A M, A' a ,),r) 

13 

C a ' <— M a r ® left|M a , | (Z) 

18 

F Fam 

14 

Z <- DApx(Q(M a ,,F AM \,A' a ,),r) 

19 

for i ^ a 7 + 1 to m — 1 do 

15 

else if 0 < m — a and a* — 0 then 

20 

Mi <- C* © Z 

16 

C a ' <- M a » © left| M a ,[(Z) 

21 

Z 4- F.dpx(Q(Mi,F M ,£),r) 

17 

T <- F>.dpx(Q(M a /,F A M, A' a ,),r) 

22 

if a 7 < m then 

18 

F Fam 

23 

M m f-Cm© left|c m | (Z) 

19 

for i <— a + 1 to m — 1 do 

24 

T' <— F.dpx(Q(M m , F m, e), r) 

20 

Ci <- Mi © Z 

25 

F^F m 

21 

Z <— D.dpx(Q(Mi,F M ,£),r) 

26 

for z 1 to a* — 1 do 

22 

if a 7 < ra then 

27 

(L,F) <- lsplit(A* ,r) 

23 

Cm 4 Mm ® left|M m | (Z) 

28 

F>.dpx(Q(F,F A ,F),0) 

24 

T F>.dpx(Q(M m , F M ,£),r) 

29 

if a* > 0 then 

25 

F <— Fm 

30 

(L,F) <- lsplit(A**,r) 

26 

for z 1 to a* — 1 do 

31 

T' <— F.dpx(Q(L, Fa, F), r) 

27 

(L,R) «- Isplit (A*,r) 

32 

F^ F a 

28 

F.dpx(Q(L, Fa, F), 0) 

33 

while T 7 < r do 

29 

if a* > 0 then 

34 

T' <— T' F.dpx(Q(£, F, e), r) 

30 

(L,F) <— lsplit(A** , r) 

35 

Z <— F.dpx(Q(s, Fn, e), r) 

31 

T^F>.dpx(Q(F,F A ,F),r) 

36 

M ^ Mi || ... || M m 

32 

F Fa 

37 

if T = left r (T 7 ) then 

33 

while T < r do 

38 

return M 

34 

T^T||F.dpx(Q(s,F,5),r) 

39 

else 

35 

Z <- F.dpx(Q(s,F N ,£),r) 

40 

return © 

36 

C <- Cl || . . . || Cm 



37 

return C, left T (T) 
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Fig. 4. The tree of all possible frame bits sequences for a single AD-message pair 
(top-left). The composition of an FKD query Qi (bottom-right). 


Theorem 3. Let b , r, c, k, n, r > 0 6e that b m r + c, k < c and n < r. Let 
FSW be the scheme of Sect. 7.2. Then, 


Adv 


fsw Ab N) < ^ b 


(q£) 2 , (g^) 2 , pN 

2 /c ’ 


+ 


+ 


Advpg^(g, £, (i, N) < 


(qt ) 2 , (<^) 2 , /iJV , q v 

2 b 


+ 


+ 2 k + 2 T ' 


We start by defining the FOFSW — an idealized FSW that internally uses the 
RO p KD instead of FKD (and thus does not use p at all). By Thm. 2 we have 
that 

Ad v^M,^iV) < Adv^ FSW (g,^ M ) + + M! + ^, 

Advpg^(g, £, n, N ) < Adv^p gw (g, n) + |_ AJ |_ F__ 


We consequently analyse the security of FOFSW, which is a relatively straightfor- 
ward task because it internally uses a FOp KD . We obtain Adv^ r ^ FSW (g, £, p) = 0 
and Adv^p SW (g v , g, £, p) < q v /2 r . A complete proof can be found in the full 
version of this paper [21]. 


8 Discussion 

Related-Key Security. Our treatment of the security of the full-state con- 
structions is in the traditional model where the adversary has no control over 
selection of the secret keys or relations among different keys. If one considers the 
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stronger model of related-key attack security then care must be taken in utilizing 
these schemes. Indeed, if an adversary has access to two instances F\ = FKS^ 
and F 2 = FKS^ 2 , and it knows the relation A = K\ ® , then it can make 

the outputs of F\ and F 2 collide trivially by asking two 6-bit queries F\{M) and 
F 2 (M®Z\). 

Although it is outside the scope of this paper to treat related-key security 
thoroughly, we informally propose some easy solutions to prevent trivial related- 
key attacks like the one mentioned before. We start by noticing that the inner- 
keyed Sponge construction [2] is not susceptible to this problem, as the secret key 
and the adversarial data blocks never overlap; hence, a simple way of thwarting 
such trivial related-key attacks is to always prepend the input data with a block 
of 6 zeroes. Thus the adversary can no longer xor an arbitrary value directly to 
the key prior to the application of the permutation. If the original adversarial 
resources were (g, £, / 1 , TV), we can without any further argumentation use the 
bound with the resources (g, £ + 1, /i, N) for this new construction. 

Another possibility would be to slightly modify the constructions and parti- 
tion the input data into an r-bit starting block and 6-bit blocks afterward. The 
initial block would be xored to the outer r bits of the initial state. Our security 
analysis would carry over to this construction with minimal modifications. 

Generalized Security Model. The security analyses of FKS and FKD cover 
those of the original Sponge and Duplex constructions as special cases. Beyond 
that, for the security analysis of FKD itself, we have generalized the security 
model of the original Duplex construction from Bertoni et al. [9,10]. While in 
the analysis of Bertoni et al. the analysis of the multiple-initializations scenario 
is left rather implicit, we include it explicitly in our model. 

This generalized setting seems more closely matching the use of the Duplex 
construction in several AE schemes which do not require sessions and new session 
keys, where one would initialize the Duplex (or FKD) construction for every 
query. This is well demonstrated by the example of FSW. More precisely, the 
way we design and analyze the security of FSW allows for a very versatile use. 
FSW can be used to secure AD-message pairs in a single session [12], i.e. using 
a single initialize call during the lifetime of the key or alternatively every AD- 
message pair can be preceded by an initialize call with a unique nonce. In fact, 
FSW can be used for anything between these two extremes; for example, a 
setting where every AD-message pair is processed with a unique nonce, but can 
get fragmented into smaller sub-pairs. The security analysis of FSW covers each 
of these use cases. 

On the Keying of the Sponge. As we have claimed in the introduction, the 
difference in the security of the outer-keyed and inner-keyed Sponges vanishes 
in presence of the full state absorption. On one hand, using a key of more than c 
bits does not increase the security level, as the extra bits cannot be used by the 
low-entropy Even-Mansour construction. On the other hand, absorbing several 
6-bit blocks of the key only results into a derived key of effective length of c 
bits. We remark that both the outer- and inner-keyed Sponges can be seen as 
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special cases of FKS, by using more restrictive padding rules that only place the 
message blocks in the outer part of the state. 

Boosting Sponge-based AE. Out of 57 CAESAR candidates, 10 are using 
a Sponge-based design. The method we used to enhance Sponge Wrap can be 
straightforwardly adjusted to boost the performance of five of these 10 schemes: 
Keyak, Ketje, STRIBOB, CBEAM and ICEPOLE [3]. This is because all the 
said designs are using frame bits for domain separation. The other designs cannot 
benefit from our modifications, either due to a domain separation method relying 
on intangibility of the inner part of the state (NORX), or due to producing tag 
from the inner part of the state (Ascon, Primates), or because they are already 
using the inner part of the state (Artemia) or because the designs do not follow 
the general structure of the Sponge Wrap (Pi Cipher) [3]. We note that if Ketje 
was to benefit from the technique we have introduced, it would be necessary to 
increase the number of rounds of the underlying permutation. 
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Abstract. Differential and linear cryptanalysis are the general purpose 
tools to analyze various cryptographic primitives. Both techniques have in 
common that they rely on the existence of good differential or linear char- 
acteristics. The difficulty of finding such characteristics depends on the 
primitive. For instance, AES is designed to be resistant against differential 
and linear attacks and therefore, provides upper bounds on the probabil- 
ity of possible linear characteristics. On the other hand, we have prim- 
itives like SHA-1, SHA-2, and Keccak, where finding good and useful 
characteristics is an open problem. This becomes particularly interesting 
when considering, for example, competitions like CAESAR. In such com- 
petitions, many cryptographic primitives are waiting for analysis. With- 
out suitable automatic tools, this is a virtually infeasible job. In recent 
years, various tools have been introduced to search for characteristics. The 
majority of these only deal with differential characteristics. In this work, 
we present a heuristic search tool which is capable of finding linear char- 
acteristics even for primitives with a relatively large state, and without a 
strongly aligned structure. As a proof of concept, we apply the presented 
tool on the underlying permutations of the first round CAESAR candi- 
dates Ascon, ICEPOLE, Keyak, Minalpher and PR0ST. 


Keywords: Linear cryptanalysis • Authenticated encryption • 
Automated tools • Guess-and-determine • CAESAR competition 


1 Introduction 

Research in symmetric cryptography in the last few years is mainly driven by 
dedicated high-profile open competitions such as NIST’s AES and SHA-3 selec- 
tion procedures, or ECRYPT’s eSTREAM project. While these focused com- 
petitions in symmetric cryptography are generally viewed as having provided a 
tremendous increase in the understanding and confidence in the security of these 
cryptographic primitives, the impressive increase of submissions to such competi- 
tions reveal major problems related to the analytical effort for the cryptographic 
community. To better evaluate the security margin of the various submissions, 
automatic tools are needed to assist cryptanalysts with their work. 

One important class of attacks is linear cryptanalysis [15,25]. The success of 
these attacks relies on the existence of suitable linear characteristics. The dif- 
ficulty of finding such characteristics depends on the primitive. For example, 
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the wide-trail design strategy [7] incorporated by AES provides lower bounds on 
the minimum number of active S-boxes in a linear characteristic and therefore, 
gives an upper bound on the highest possible bias. On the other hand, we have 
primitives with weak alignment [1], such as the winner of the SHA-3 competition 
KECCAK, where finding good characteristics is an open problem, and heuristic 
search results are required to evaluate the security margin of the primitive. This 
is particularly interesting in the context of the CAESAR competition [26]. We 
noticed that many first round submissions focus their analysis on differential 
cryptanalysis, but provide only few results for linear cryptanalysis. 

Our Contribution. The main contribution of this paper is a dedicated 
automatic tool for linear cryptanalysis, which is available at github 1 . The tool 
performs heuristic searches for good linear characteristics in cryptographic prim- 
itives. It was designed for primitives based on substitution-permutation networks 
(SP networks). 

The modular design of the tool allows easy extension to other cryptographic 
primitives. It also allows to easily develop and test new dedicated search strate- 
gies. To facilitate further improvements and analysis, the tool is publicly avail- 
able and its source code is published together with this paper. Such a tool is 
particularly useful when designing new cryptographic primitives. It allows to 
easily explore the effects of, for instance, different S-boxes and linear layers on 
linear characteristics and reveals possible bad decisions in an early stage of the 
design process. Even in wide-trail designs with provable bounds, it can be useful 
to evaluate different choices for building blocks with respect to their long-term 
behaviour over a larger number of rounds, where the quality of the best charac- 
teristics can deviate significantly from the derived bounds (i.e., two algorithms 
with the same bounds may behave quite differently in a heuristic search, which 
can be a basis for the decision of choosing one design over the other). 

As a proof of concept and to demonstrate the advantages of the tool, we have 
chosen the first round CAESAR candidates Ascon [9] , ICEPOLE [19] , Keyak [4] , 
Minalpher [22] and PR0ST [13] as analysis targets. Ascon, ICEPOLE, and Keyak 
are sponge-based authenticated encryption schemes. All three primitives use per- 
mutations that are not strongly aligned, making it hard to find good linear char- 
acteristics. We demonstrate the capability of our automated search tool by giving 
linear characteristics suitable for different attack scenarios. In comparison, the per- 
mutations used in Minalpher and PR0ST provide more “structure” by incorporat- 
ing an “AES-like” design strategy. Hence, the designers of these two primitives are 
able to give computer-aided bounds on the minimum number of active S-boxes by 
using mixed-integer linear programming (MILP) for a number of rounds sufficient 
to thwart attacks. For Minalpher and PR0ST, we show that our tool is capable of 
finding linear characteristics which match the provided bounds. Our results are 
summarized in Table 1 (Sect. 4). 


https: / / github.com / iaikkrypto/lineartrails. 
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Related Work. While several automatic tools for differential cryptanalysis 
have been published in the last few years [5,6,8,12,14,16,20,23], in particular 
for hash functions, the work on automatic tools dedicated to linear cryptanalysis 
is very limited. One example is a tool designed by Sun et al. [24], extending pre- 
vious work of Mouha et al. [21]. They model the differential and linear behavior 
of a block cipher as a mixed-integer linear program (MILP) and use general- 
purpose MILP tools to solve the optimization problem (i.e., find the optimal 
characteristics for the - often simplified - model of the cipher). This approach 
works well for lightweight ciphers like Simon or Present, but faces problems when 
it comes to large-state and less structured ciphers such as Ascon, ICEPOLE, 
and Keyak. Hence, a dedicated search tool for linear characteristics will com- 
plement the existing tools. 


Outline. This paper is divided into two main parts: the description of our new 
automated search tool for linear characteristics in Sect. 3, and its application 
to the CAESAR candidates in Sect. 4. However, first, we start with a short 
introduction to linear cryptanalysis and our notation in Sect. 2. Then, we deal 
with the propagation of linear masks in Sect. 3.2 and discuss the proposed search 
strategy for linear characteristics in Sect. 3.3. The application of the tool (Sect. 
4) is first discussed in detail for Keyak in Sect. 4.1. Then, our results for the 
other ciphers are summarized and briefly discussed in Sect. 4.2 to 4.5. Finally, 
we conclude in Sect. 5. 

2 Linear Cryptanalysis 

The goal of linear cryptanalysis [15,25] is to identify good affine linear approxi- 
mations for the target cipher. More specifically, we want to find linear equations 
between the plaintext bits, ciphertext bits and key bits that hold with prob- 
ability significantly different from - (bias). Then, in the actual attack phase, 
these equations can be used to derive information on the key bits from known 
plaintext-ciphertext pairs. 

For linear cryptanalysis, the operation of the cipher, or building blocks of 
the cipher, is considered as a vectorial boolean function / : F™ — > Frf (where the 
key bits might be part of F™). A (probabilistic) linear relation between input 
and output bits of / is then characterized by two linear masks a G F™,/? G F^. 
For x G F™, 1 / G FJ with y = f(x), the masks represent the relation 

a* • x = /3* • y, 

where v t • w denotes the natural inner product of vectors. The quality of a 
linear approximation a, /? is measured by the probability that the corresponding 
relation holds; or more precisely, by how far this probability deviates from the 
average This deviation is referred to as the bias of the masks a,/?: 
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= bias/ (a, /3) 


F[a t -x=(3 t -y\y = f(x)] - 1 

T. | {a; G F™ \a t ■ x = f? ■ f(x)}\- 2 m ~ 1 


If m is very small, the expression for can easily be evaluated explicitly for all 
masks <a, /? to determine the best masks. This information is summarized in the 
linear distribution table (LDT), where non-zero entries mark masks a,/? with 
non-zero bias. 

However, this is obviously infeasible for the complete cipher at once. To 
obtain an approximation of the complete cipher, it is split into smaller parts 
that are easier to analyze. Matsui’s piling-up lemma [15] is used to combine the 
individual biases of multiple building blocks to derive the overall bias (under the 
assumption that the validity of the partial approximations is independent). If 6 
denotes the bias of the overall approximation of the block cipher, Matsui [15] 
showed that the necessary number of plaintext-ciphertext pairs to derive the bit 
of key information from the approximation is proportional to ^ . 

The difficult part is to find a network or “trail” of partial approximations 
that are compatible with each other and give a good overall bias. In particular, 
each involved approximation must have non-zero bias, otherwise the overall bias 
becomes zero. For this reason, we refer to non-zero entries in the individual LDTs 
as “valid transitions” of masks for this building block. In the the following, such 
a “trail” of partial linear approximations is called linear characteristic. 

Several algorithms and improvements thereof have been proposed for finding 
characteristics with the highest overall bias, typically by a sort of branch- and- 
bound algorithms. For more complex, modern ciphers, such a complete search 
is not feasible. Two possible approaches to handle this situation are (a) to 
design ciphers in a way to allow to prove bounds on the best possible bias, and 
(b) to use heuristic search methods to find stronger biases (for reduced versions 
of the cipher) to make better predictions on the security margin of the complete 
cipher. 

In the following, we will focus on the second approach, and heuristically 
search for good characteristics. Unlike the original, complete search algorithms, 
our search will not proceed in a “linear”, round-by-round manner. Instead, we 
will take inspiration from similar searching tools for differential cryptanalysis [8] , 
and randomize the search order. This naturally implies that we will often start 
building inconsistent characteristics, which will need to be fixed or discarded. 


3 An Automated Tool for Linear Cryptanalysis 

The proposed automated tool can be roughly split into two main parts. The 
first part is described in Sect. 3.2 and deals with the description of crypto- 
graphic primitives within the search tool, including the representation of linear 
approximations and, most importantly, their propagation. The other part of the 
tool is the choice of the search algorithm to find good linear characteristics (see 
Sect. 3.3). Before we start with the description of the tool, we take a look at 


494 C. Dobraunig et al. 


the requirements we have for the design and implementation of such a heuristic 
search tool. 

3.1 Implementation Requirements for the Search Tool 

In order for any automatic cryptanalysis tool to be useful for general application, 
for example to analyze the 57 first round CAESAR submissions, there are a 
number of flexibility and usability requirements: 

- Easy to Add New Primitives. This is one of the main goals for the cre- 
ation of this tool. To fulfill this requirement, we have decided to put the focus 
on primitives based on SP networks, i.e., with alternating S-box and linear 
layers. This simplifies the design process of the tool, since we did not have to 
consider every possible specialty, while still having a large group of applicable 
primitives. The programming interface should be designed to require as little 
effort as possible for converting, for example, a CAESAR reference implemen- 
tation to a suitable cipher definition for the tool - ideally, it should possible 
to just copy the corresponding code fragments for the round transformation 
steps. 

- An Easily Adaptable, Parameterized Search Algorithm. The linear 
tool implements a heuristic guess-and-determine search algorithm. This algo- 
rithm delivers good results for various primitives. However, the success of the 
search is highly dependent on various different parameters, such as the config- 
uration of the searching order and conflict-handling behavior. Therefore, it is 
crucial that these parameters can be adjusted easily. For this reason, our stan- 
dard guess-and-determine algorithm is parameterizable via an XML-file. This 
XML-file specifies the search starting point and allows to configure various 
other parameters. 

- Easy to Add other Search Algorithms. The currently implemented, 
stack-based guess-and-determine algorithm is certainly not the only possible 
way to search for linear characteristics. To be open for new ideas and evaluate 
other algorithms, we have designed the tool in a way that the search algo- 
rithm is clearly separated from the description of the cipher and thus, can be 
replaced easily. This opens the door for experiments with various alternative 
search algorithms and will hopefully lead to new insights in this direction. 

- Portability of the Code. We do not want the tool to require a specific oper- 
ating system or platform to run. Therefore, we have reduced the dependence 
from external libraries whenever possible, and omitted the use of platform- 
specific instructions. 


3.2 Propagation of Linear Masks 

Our overall search strategy is based on the “guess-and-determine” approach. 
We want to build a consistent linear characteristic with high bias step by step, 
starting from a “mostly unknown” (undetermined) characteristic of masks, and 
progressively deciding which bits should be selected (activated) by the final mask. 
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For this purpose, we repeatedly “guess” the value of small parts of the masks, 
and then “determine” the consequences of this guess (in particular, whether this 
updated partial characteristic can still be completed to a “valid” characteristic). 
We refer to the “determining” step as propagation of information. 


Representation of Partial Linear Masks. The tool represents the linear 
masks on bit-level. During the search, we work with partially-determined search 
masks. We represent an active bit in the linear mask with 1 and an inactive bit of 
a linear mask with 0. Mask bits that are not yet determined are represented by ?. 


Propagation in SP Networks. We want to find linear characteristics for 
SP networks. Such a network consists of iterative applications of a substitution 
layer (consisting of relatively small S-boxes) and an (affine) linear layer (which 
typically covers larger parts of the state at once). We use different techniques 
for the propagation of information in these two layer types. The goal of the 
propagation step is to investigate whether the guess allows to derive explicit 
values for other (“neighbouring”) bits, and in particular whether this explicit 
information is contradictory. The constraints that allow this propagation can be 
derived from the linear distribution table of the involved functions, since the 
characteristic must not contain any mask transitions with bias 0. 


Propagation in the Non-linear Layer. We only deal with non-linear layers 
which can be represented by parallel applications of S-boxes. So the propaga- 
tion of the linear masks at the input and the output of the S-boxes can be 
treated individually, since the parallel applications are considered independent 
of each other (any dependencies induced by the linking linear layers are treated 
separately). Therefore, we can do the propagation separately per S-box. 

Many state-of-the-art ciphers use relatively small S-boxes. In many recent 
cipher proposals, the S-boxes map 4- to 5-bit inputs to outputs of the same 
size. Even the largest S-boxes hardly ever exceed a size of 8 bits. Therefore, 
the propagation of the linear masks can be done in a brute-force manner, based 
on the linear distribution table (LDT) of the S-box. The LDT is an exhaustive 
list of all valid (biased) mask transitions from mask a to mask (3. Our cur- 
rent “knowledge” of the values of some input and output mask bits limits the 
set of available transitions. Depending on the concrete values of a and (3 and 
the remaining transition options, we have one of the following outcomes of the 
propagation: 

1. Contradiction: The LDT reveals that no valid, biased transitions remain 
that satisfy the fixed mask bits; i.e., there is no linear relationship involving 
the bits currently marked by a and (3 as 1 (and optionally the ? bits). In other 
words, we have a contradiction. This means that the current, partially deter- 
mined linear characteristic is in fact invalid. This situation has to be handled 
by the search algorithm by, e.g., backtracking and resolving the contradiction. 
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2. Updated Bits: The LDT reveals that one or more biased transitions respect- 
ing the partially determined a and (3 remain. In addition, all remaining tran- 
sitions share the same value (0 or 1) for one or more of the current ? bits. 
Thus, we can refine some previously undefined bits in the masks to active 
or inactive bits by using information from the LDT. Before taking any fur- 
ther guesses, this newly- won information must in turn be propagated in all 
connected function components. 

3. No Updates: The LDT reveals that a and (3 are possible, but no additional 
explicit bit-wise information can be won. Nothing else happens. 

Propagation in the Linear Layer. There are two main differences between 
the linear and non-linear layers from the propagation perspective: On the one 
hand, the linear layer typically involves significantly more variables than individ- 
ual S-boxes. On other hand, propagating partial linear masks for linear functions 
can be achieved easily using basic linear algebra. 

Assume that the function / : F™ — > Fr? is linear, i.e., f(x) = A • x = y for 
some A E F^* 777 . Note that we can include affine linear functions in the same 
model, since the affine (constant) part is irrelevant for the bias of the linear 
model if we do not consider the sign of the bias. Then, for a fully determined 
mask a,/?, the bias e a ^ is either 0 (wrong model) or \ (exact, correct model). 
More specifically, <a, (3 is a valid input-output mask if 

\/x : a 1 • x = (3* • f(x) <^> \/x : a* • x = (3 1 • (A • x) 

<^> \/x : cA • x = (. A 1 • ff) 1 • x 
O a = A3 • (3. 

If a and f3 are only partially determined, all propagation can be performed by 
propagating the information in the linear system a = A3 • f3. For this purpose, 
we always keep the half-solved system in reduced row echelon form for all linear 
layers. Whenever mask values in a or (3 are updated, we perform partial Gaussian 
elimination to retain reduced row echelon form. If in the process, other bits of a 
or (3 are determined (case 1 or 2 from above), this information is extracted from 
the system and instead stored in the regular representation <a, (3 of the mask bits 
that is also used for S-box propagation. 

Update Process. Every time the propagation step leads to new, explicit infor- 
mation in the linear masks (i.e., mask bits that were previously undetermined 
are now fixed, case 2), this information has to be propagated over the connected 
linear or non-linear layers, which share those updated mask bits. In other words, 
the propagation step needs to be iterated to update the neighbouring layers. 
Since we require that every linear layer is only connected to non-linear layers and 
vice versa, we can use a very simple update process scheduling: After each guess 
or update, we first perform propagation on all non-linear layers (with updated 
bits), then on all linear layers (with updated bits). This process is iterated until 
the propagation process has converged and no additional explicit information 
can be learned anymore, or a contradiction was detected. 
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3.3 Search for Linear Characteristics 

In this section, we discuss our proposed search strategy. The search strategy 
guides the guessing behavior (choice of bits or bit sets to guess, and their val- 
ues), as well as the backtracking behavior after detecting contradictions. We 
currently implement a simple stack-based search algorithm, similar to the strat- 
egy used in recent tools for differential characteristic search [16,17]. We first 
give an algorithmic overview, before detailing the choices made for individual 
ingredients. 


Basic Search Algorithm. We start from a mostly-undetermined characteristic 
Ao as a starting point, and incrementally guess more and more of its mask bits. 
We refer to the current characteristic as A , and keep a history of the guesses 
that led from Ao to A in the stack S. For each guess, we select a guessable item 
X in the current characteristic A. Depending on the search strategy, this can 
be a single bit, or all bits of an S-box input-output mask (unlike some tools for 
differential characteristic search which only consider individual bits). The choice 
of X is guided by the search and backtracking strategy. The characteristics 
stored in S are used for backtracking, where some of the most recent guesses are 
undone to resolve conflicts, i.e., we return to an older status stored in S. The 
basic search algorithm is summarized in Algorithm 1. 


Algorithm 1 . Guess-and-determine search algorithm 
choose characteristic Ao as starting point 

loop 

push Ao to empty stack S 

repeat 

get the topmost characteristic A from S 
select a guessable item (bit or S-box) A in A 
for all most preferable possible values x of X do 
guess X to x 
propagate information 
if contradiction detected then 

undo guess x and all resulting updates 
else 

push A to S' and break 
if no valid assignment x was found then 
backtrack by popping characteristics from S 
mark X critical 

until exhausted or solution characteristic found 


Choice of the Starting Point. The starting point is a linear characteristic, 
in which most mask bits are still undetermined. The appearance of the starting 
point depends highly on the scenario in which the linear characteristic will be 
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used, since it can be used to define which bits of the resulting characteristic must 
definitely be active or inactive. 

For instance, consider the search for a linear characteristic for a block cipher 
or a permutation. In principle, every bit of the input or the respective output 
mask can be active in such a scenario. So, we can use a starting point where 
nearly every bit of the respective input and output linear masks is free for guess- 
ing during the search. On the other hand, if we consider sponge-like modes, we 
have more restrictions on the characteristic. Here, the attacker can only observe 
or control a fraction of the state on the input and the output. Depending on the 
actual attack, it can be necessary that bits belonging to unknown parts of the 
state remain inactive, and that only observable or controllable bits are active. 

Besides defining the possible use-cases of the linear characteristic, the choice 
of the starting point also greatly influences the expected success of the search. 
By fixing parts of the starting point, it is possible to reduce the search space 
significantly, and thus accelerate the search to quickly find results that would 
otherwise be out of range. However, reducing the search space also has the 
potential to exclude classes of good characteristics. Thus, the starting point 
is usually not too much restricted at the beginning of the analysis of a certain 
cipher. Instead, the choice of the starting point is an adaptive process based on 
the cryptographer’s intuition and the cipher’s structure, using information from 
previous searches. 

Guessing Strategy. The guessing strategy specifies which undetermined bit 
or S-box is picked next for guessing, and how it will be refined. In S-box-based 
designs, the search success can profit significantly from guessing in an S-box- 
oriented manner; that is, by guessing the value of all bits in an S-box input- 
output mask at once. We refer to this as “guessing the S-box” . Even if guesses 
are made S-box after S-box, the propagation procedure can produce half-guessed 
S-boxes with some bits fixed and others undetermined. It is also possible to mix 
S-box-wise and bit-wise guessing. 

We refer to an S-box as “guessable” if the linear input and output masks 
contain at least one remaining undetermined ?-bit, and “fixed” or “not guess- 
able” otherwise. In addition, the search configuration may limit the selection 
of S-boxes currently eligible for guessing, depending on the guessing progress. 
The most important example for this is the “critical” status that is assigned 
to an S-box after a failed attempt to find any valid assignment for this S-box, 
and assigns top priority to this S-box. Additionally, it can be useful to impose 
cipher-specific rules; for example, to demand that all S-boxes of the first few 
rounds must be fixed before we start guessing values in the following rounds. 

To guide the guessing procedure, each guessable S-box is assigned a proba- 
bility for being selected as the next guessing target, for example based on the 
criteria described above. In addition, all possible assignments for a guessable 
S-box are ranked by how promising they are estimated to be for high-bias char- 
acteristics. Of course, the primary guess for potentially inactive S-boxes (i.e., 
only with bits 0 and ? so far) is to keep them inactive (i.e., all 0). If this is not 
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possible, the S-box is marked as active. If the selected guessable S-box is already 
marked active, we rank all possible assignments of the linear masks according 
to their linear bias and the number of active bits. We pick a random optimally 
ranked assignment as primary guess. If the following propagation reveals that 
this assignment is in fact impossible, we try other assignments until no alterna- 
tive is left, or we have reached a predefined threshold on the number of trials. 


Backtracking. If all alternative assignments fail (or a predefined threshold of 
trials is reached), we need to backtrack. To resolve this conflict, we return to an 
earlier version of the linear characteristic as stored on the stack S. Again, we try 
to guess the same critical S-box that caused the conflict. If we cannot resolve 
the conflict here, we jump further back, until it can be resolved. 


Restarts. To better randomize the search process and avoid being “stuck” with 
a few unhappy first guesses, it is helpful to occasionally restart the complete 
search. For this purpose, we define a limit of “credits” or resources for one 
search run. When this limit is exhausted before finding a valid, fully determined 
characteristic, we clear the stack S and restart from scratch with the starting 
point Aq. Additionally, the search is also restarted after completing a successful 
run, with the hope of finding new, better characteristics. If the cryptographer 
detects promising patterns in the preliminary results, these can serve as a basis 
for improved starting points for the next run. 

4 Application to CAESAR Candidates 

In this section, we demonstrate the advantages of our tool for linear cryptanaly- 
sis by applying it to several first round CAESAR candidates: Keyak, Ascon, 
ICEPOLE, PR0ST, and Minalpher. All the analyzed candidates are permutation- 
based (rather than based on block ciphers). This is, however, not a constraint 
of the linear tool, which works just as well for block ciphers, since the typical 
round- key additions do not influence the linear characteristics. Rather, it is rep- 
resentative of the trend that a significant portion of CAESAR candidates with 
new, dedicated SPN primitives are permutation-based, since most block-cipher 
modes employ AES. 

For each candidate, we first consider linear characteristics for the (round- 
reduced) permutation. However, for many modes (in particular for sponges), 
an attacker cannot influence the complete input to the permutation, or cannot 
observe its complete output. For this reason, we also investigate characteristics 
with additional constraints, where parts of the linear masks are fixed beforehand. 
We define the following three types of linear characteristics: 

- Type I (Permutation): For this type of characteristics, we do not require 
any additional restrictions regarding the positions of active bits in the linear 
characteristic. Hence, a characteristic of this type might not be usable in 
a concrete attack on the duplex-like constructions of Keyak, Ascon, and 
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ICEPOLE. Nevertheless, even for modes where Type-I characteristics allow 
no direct attacks, they still give insights in the resistance of the cryptographic 
primitive against linear attacks. 

- Type II (Output Constrained): Linear characteristics of this type have 
the restriction that all active bits at the end of the characteristic have to be 
“observable”. For duplex-like constructions, this means that all active mask 
bits have to be in the outer (rate) part of the state. Such linear characteristics 
can be used to create key-stream distinguishes in known-plaintext scenarios 
for duplex-like constructions, or even to perform key-recovery attacks. 

- Type III (Input and Output Constrained): In addition to Type-II char- 
acteristics, also all active bits of the input have to be in the outer (rate) part 
of the state. This type of linear characteristic can act as a key-stream distin- 
guisher in known-plaintext scenarios for duplex-like constructions, targeting 
the encryption of the plaintext. A similar type of linear relations has been 
used for instance by Minaud [18] to detect linear biases in the key-stream of 
the CAESAR candidate AEGIS. 

We first discuss our approach and our findings for Keyak in more detail, and 
then briefly present our results for Ascon, ICEPOLE, PR0ST, and Minalpher. 

4.1 Keyak 

Brief Description of Keyak. Keyak is a family of authenticated encryption 
algorithms designed by Bertoni et al. [4] and is one of the 57 submissions to 
the first round of the CAESAR competition. It is based on the round-reduced 
Keccak -/ permutation and follows the duplex construction [2]. The designers 
have defined four instances of Keyak; all instances share the same capacity 
c = 252 and use 12 rounds of the Keccak -/ permutation, but differ in their 
state size b and the degree of parallelism p\ 

- River Keyak: b = 800, p = 1 (serial), 

- Lake Keyak: b = 1600, p = 1 (serial), 

- Sea Keyak: b = 1600, p = 2 (parallel), 

- Ocean Keyak: b = 1600, p = 4, (parallel). 


The Keyak Duplex Mode. Figure. 1 sketches the encryption of serial Keyak 
without associated data: The initialization takes as input the secret key K and 
public nonce TV, and applies the permutation / once. This ensures that one 
always starts with a random-looking state at the beginning of the encryption of 
the plaintext. Afterwards, the plaintext is processed by xoring it block- wise to 
the internal state, separated by invocations of the permutation /. The ciphertext 
blocks are extracted from the state after adding the plaintext. After all data is 
processed, the finalization applies the permutation / once more and returns the 
tag. For more details on Keyak, including the rules for processing associated 
data, we refer to the specification [4]. 
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Fig. 1 . Simplified sketch of Lake Keyak encryption (without associated data). 


The Keyak Permutation. The Keyak permutation is a round-reduced version of 
the Keccak -/ permutation, reduced to 12 rounds. It operates on the 5 x 5 = 25 
re-bit words (“lanes”) S[x] [y\[*\ of the state 5, with w = 32 or 64. Each round 
applies the five steps R = i o tt o p o #, where all steps except i are equivalent 
for each round. 

- Step 0 adds to every bit of the state S[x] [y] [2:] the bitwise sum of the neigh- 
bouring columns S[x — l][*][z] and S[x + l][*][z — 1]. This procedure can be 
described by the following equation: 

e : S[*M*] - S[x]\y][z] + £ S[z - 1 }[y'}[z] + £ S[x + ~ ^ 

y '= 0 y '= 0 

- Step p rotates the bits in every lane by a constant value, 

p: S[x\[y][z\ <- S[x][y][z + C(x,y)], 

where C(x,y) is a constant value. 

- Step 7 r permutes the lanes using the following function: 

tt : %][?/][*] <- S[x'][y'} [*], where Q = ( 23 )' (y') ' 

- Step x is the only non-linear step in Keccak and operates on each row of 
5 bits: 


X : SMy][z] -SNMN ®((-'% + 1]MN)A5[i + 2 ][y][z]), 

which can be seen as applying a 5-bit S-box in parallel to all rows. 

- Step l adds a round-dependent constant to the state. For the actual values of 
the constants, we refer to the design document [4]. 

The designers provide some results on the linear properties of this permutation 
online, as part of the KeccakTools package [3]. 
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Results for Keyak. For our analysis, we focus on the primary recommendation 
Lake Keyak using state size b = 1600. Since Lake Keyak, in contrast to 
Ascon and ICEPOLE, uses the same permutation (with the same number of 
rounds) in the initialization, finalization, and plaintext-processing phase, Type- 
III characteristics (to target plaintext-processing) offer no remarkable advantage 
over Type-II characteristics (to target the initialization). For this reason, we only 
consider Type-I and Type-II characteristics. 

Type- 1 Characteristics ( for 3 and 4 Rounds of the Permutation). We first con- 
sider Type-I characteristics, i.e., linear characteristics for the underlying round- 
reduced Keyak permutation (Keccak-/) without any additional restrictions. 
We performed a search for linear characteristics for 4 rounds of the 1600-bit per- 
mutation. The best linear characteristic we found has 33 active S-boxes, which 
results in a bias of 2 -34 . The best linear characteristic for 3 rounds with 13 active 
S-boxes and a bias of 2“ 14 can be obtained by omitting the first round of the 
4-round linear characteristic. Our results are very similar to the characteristic 
given in the KeccakTools package [3] . 

Type-II Characteristics (for 3 and 4 Rounds of the Initialization). The previous 
3 and 4-round characteristics have active bits in the inner part (last four 64-bit 
words) of the state after round 4. Therefore, we cannot use this characteristic in 
an actual attack. For an attack on the initialization of round-reduced Keyak, we 
have to apply additional restrictions on the linear characteristics. Since we can 
only observe the outer (rate) part of the state at the output of the permutation 
after the initialization, we apply the restriction that only this part is allowed to 
contain active bits. Note that the input of the first permutation call is either 
known or constant. Therefore, we have no problems with active bits there. 

For the initialization reduced to 3, or 4 rounds, we found characteristics 
which only have active S-boxes in the rate part of the state. Thus, considering 
a known-plaintext attack, we know all the output bits of these S-boxes and can 
invert them. This leads to the fact that the last round does not influence the 
bias. So we have an expected bias of 2 -13 for the 3-round version, and 2 -49 for 
the 4-round version of these characteristics. Taking the last S-box layer also into 
account, the bias of those characteristics would be 2~ 26 and 2 -70 , respectively. 
When inverting the last S-boxes, both characteristics result in trivial key-stream 
distinguishers for round-reduced versions of Keyak with complexity 2 26 and 2 98 , 
respectively. Moreover, these distinguishers could also be used in a key-recovery 
attack on round-reduced Keyak, resulting in similar complexities. 

Configuration of the Search. As already mentioned, the proposed search tool is a 
heuristic one and thus, the quality of the results heavily depends on the applied 
heuristic search criteria, as well as on the definition of the starting points. For 
the search process that led to the Type-II characteristics for 3 and 4 rounds of 
Keyak, we used a quite natural starting point: For both starting points, the 
only restriction is that the S-boxes of the last round which “belong” to the inner 


Heuristic Tool for Linear Cryptanalysis with Applications 503 


part of the state must be inactive. In addition, one S-box in the second round is 
marked as active (to exclude the trivial, entirely inactive solution). 

We split the search into two stages. In the first stage, we only pick poten- 
tially inactive guessable S-boxes, and set them to the best possible assignment 
(typically a completely inactive input and output linear mask). Which S-box is 
picked and refined is determined by a heuristic that picks the S-boxes according 
to a previously configured weight distribution. These weights can be manually 
assigned in the search configuration file (the same file in which the starting 
point is defined). In case of the search for the 3-round Type-II characteristic, 
the weights were assigned so that S-boxes of the first and second round have a 
50 times higher chance to be picked compared to an S-box of the last round. 
The intention behind this distribution is that the majority of the active S-boxes 
of the resulting linear characteristic should be located in the last round, because 
their output is known in an attack. Hence, these active S-boxes can be inverted 
and do not contribute to the bias. Our heuristic for the 4-round Type-II char- 
acteristic prefers S-boxes from rounds 2 and 3 over S-boxes from rounds 1 and 
4. Additionally, round 1 is favored over the last round 4. In the second stage, 
after every guessable and potentially inactive S-box is already determined, we 
continue by guessing active and yet not fully determined S-boxes until the linear 
characteristic is fully determined. 

As can be seen above, the choice of the starting point and search heuristic 
depend on the structure of the target primitive, the planned use for the linear 
characteristic, and on the intuition of the cryptographer. Thus, better search 
strategies and starting points might exist, which may lead to better linear char- 
acteristics than those given in this paper. 

4.2 Ascon 

Brief Description of Ascon. Ascon is a family of sponge-based candidates, 
designed by Dobraunig et al. [9]. Compared to Keyak, it features a signifi- 
cantly smaller state of 320 bits, and the linear layer is applied to each of the 5 
64-bit words independently. The 5-bit S-box, on the other hand, is closely related 
(affine equivalent) to that of Keyak. The primary proposal Ascon- 128 has a 
rate of 64 bits and hence, a capacity of 256 bits. 

Results for Ascon. For the linear tool, the simple design of the linear layer 
means that its linear model can be split into 5 separate, independent matrices. 
Combined with a small state size, this property greatly reduces the cost for linear 
algebra needed to perform the propagation compared to Keyak. 

Our findings for Ascon are summarized in Table 1. The number of active 
S-boxes of Type-I characteristics found with the help of this tool have already 
been included in work presented at CT-RSA 2015 [10]. Note that the character- 
istics given here are optimized for a minimum number of active S-boxes, rather 
than minimal bias. For Ascon- 128, we additionally search for Type-II and Type- 
III characteristics. However, regarding Type-III characteristics, no meaningful 
results were obtained. 
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4.3 ICEPOLE 

Brief Description of ICEPOLE. ICEPOLE is a family of authenticated 
encryption schemes designed by Morawiecki et al. [19]. It consists of the three 
proposals ICEPOLE-128, ICEPOLE-128a, and ICEPOLE-256a, which all use 
the same underlying 1280-bit permutation. All variants use 12 rounds of the 
permutation for initialization, and 6 rounds for processing of plaintext and final- 
ization. However, they differ in details like size of the rate, key, nonce and tag. 

The 1280-bit state of ICEPOLE is stored in 5 x 4 = 20 64-bit words. For 
the linear layer, an MDS matrix over IF 2 5 is first applied 64 times in parallel (to 
each 20-bit slice of the state). Then, each word is rotated, and the words swap 
positions. The S-box layer applies 5-bit S-boxes (4 parallel row-wise applications 
for each 20-bit slice). 

ICEPOLE’s designers perform no dedicated linear analysis, but compare the 
cipher’s resistance to linear cryptanalysis to its well-studied resistance against 
differential cryptanalysis. They conclude that the attack complexity after 5-6 
rounds should be “completely intractable” [19]. At FSE 2015, Huang et al. [11] 
presented 3-round linear characteristics that they use in a differential-linear 
attack on ICEPOLE. 

Results for ICEPOLE. The Type-II and Type-Ill characteristics given in 
Table 1 are constrained with respect to a capacity of 254 bits (due to padding, 
256 bits are not observable), as defined for ICEPOLE-128 and ICEPOLE-128a. 
In the case of ICEPOLE, we do not have an immediate output of a ciphertext 
block right after the 12 rounds of the initialization. Before this happens, there is 
the option to process a secret message number and at least an empty associated 
data block is processed. Hence, 6 or even another 12 additional rounds have to 
be passed before an output suitable for our Type-II characteristic is accessible. 
Thus — in the worst case — our key-stream distinguisher using Type-II character- 
istics works for round-reduced versions of ICEPOLE-128, where the initialization 
plus the following processing is reduced to 5 out of 24 rounds with a complexity 
of about 2 120 . 

Type-III characteristics can be used to create distinguishers that target the 
processing of the plaintext. Here, every version of ICEPOLE uses the 6 round 
version of the ICEPOLE permutation. Thus, by using the Type-III characteristic 
in Table 1, the key-stream produced by round-reduced variants of ICEPOLE-128, 
where the permutation used in the plaintext processing is reduced to 4 (out of 
6), rounds can be distinguished from a perfect randomly generated key-stream 
with a complexity of about 2 88 . The bias of the 5-round Type-III characteristic is 
2 -87 08 and hence, the complexity of a resulting key-stream distinguisher cannot 
harm the 128-bit security of ICEPOLE-128. ICEPOLE-256a, on the other hand, 
claims a security level of 256 bits regarding the confidentiality. However, it has a 
higher capacity of 318 bits and therefore, the characteristics given in Table 1 can- 
not be used directly. Taking the higher capacity of ICEPOLE-256a into account, 
we get a Type-III characteristic with a bias of 2 -89,49 , which can be used to 
distinguish the key-stream of a round-reduced variant of ICEPOLE-256a, where 
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the permutation used during the encryption is reduced to 5 (out of 6 rounds). 
Note that ICEPOLE-256a limits the number of blocks encrypted under a sin- 
gle key by 2 62 . However, this type of key-stream distinguishes exploit relations 
between ciphertext block Ci and the key-stream used to generate the following 
ciphertext block C*+ 1 . Thus, distinguishes using Type-III characteristics in this 
way do not rely on the fact that always the same key is used. 

Table 1 contains the results with the best bias, but not necessarily the mini- 
mal number of active S-boxes we found. For example, for 6 rounds, we also found 
a Type-I characteristic with only 103 active S-boxes, but a bias of 2“ 133,49 (com- 
pared to 104 active S-boxes with bias 2 -126,32 as in the table). 

4.4 Minalpher 

Brief Description of Minalpher. Minalpher is designed by Sasaki et al. [22]. 
In contrast to the previous 3 candidates, Minalpher is no sponge-based design. 
Instead, the permutation is applied in a new tweakable block cipher construction, 
called tweakable Even-Mansour. For this construction, the permutation size only 
needs to be twice the security level, so for 128-bit security, Minalpher has the 
smallest of all investigated permutation sizes with only 256 bits. This small state 
is further divided into two halves, whose only interaction in each of the 17.5 
rounds is that one half is once xored to the other half, and the two halves swap 
places. Besides the interaction between the halves and some nibble reordering, 
the linear layer features a near-MDS matrix multiplication over F 2 4. The S-box 
size of 4 bits is also nibble-oriented. 

For Minalpher’s construction, only Type-I characteristics are useful. We 
understand our results as an analysis of the underlying permutation. However, 
since Minalpher claims security in nonce misuse settings and under unverified 
plaintext release, the Type-I characteristics could also be useful for attacks on 
the cipher. In particular, for a fixed nonce, the construction allows to control 
the entire permutation input (at least differentially, due to the Even-Mansour 
construction, which xors a key- and nonce-dependent value before and after the 
permutation) and observe the entire output (again, differentially). 

The designers analyze the minimum number of active S-boxes (for differen- 
tial cryptanalysis) theoretically, and prove a minimum number of 22 S-boxes for 
4 rounds. For up to 7 rounds, they extend the bounds with the help of mixed 
integer linear programming (MILP). The bounds obtained this way for the num- 
bers of rounds r also covered by this work are 22 (r = 4), 41 (r = 5), and 58 
(r = 6). The designers claim that the same bounds apply for linear cryptanalysis. 

Results for Minalpher. The existing bounds serve as a kind of benchmark 
for our tool to check its capabilities. As shown in Table 1, we were able to match 
the given bounds for up to 6 rounds. For better comparability, we included our 
results with the minimal number of active S-boxes, but not necessarily the best 
bias, in the table. For example, for 6 rounds, we also found a Type-I characteristic 
with a smaller bias of 2 -61 , but with 60 active S-boxes (compared to 58 active 
S-boxes with bias 2 -62 in the table). 
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4.5 Pr0st 

Brief Description of Pr0st. PR0ST, designed by Kavun et al. [13], offers 
both a sponge-based mode and two block-cipher-based modes, where the latter 
use the PR0ST permutation in a single-key Even-Mansour construction. Each 
of the three modes offers two security levels: one based on the 256-bit PR0ST- 
128 permutation, and one based on the 512-bit PR0ST-256 permutation. The 
state is stored as 4 x 4 words of 16 or 32 bits, respectively. Both the 4-bit S-box 
(row- wise) and the 16-bit linear mixing function (MDS over F 2 4 are applied in a 
bit-sliced way (using 1 bit of each word). Then, each word is rotated. The number 
of rounds per permutation call is r = 16 (PR0ST-128) or r = 18 (PR0ST-256), 
respectively. 


Table 1. Results for Keyak, Ascon, ICEPOLE, Minalpher, and PR0ST. The corre- 
sponding linear characteristics can be found in the full version of this paper. 


Cipher 

Type 

Rounds 

Active S-boxes 

Bias 

Keyak 

I 

3 

13 

2~ 14 

4 

33 

2-S4 

II 

3* 

12 

2~ 13 

4* 

43 

2-49 

Ascon 

I 

3 

13 

2~ 15 

4 

43 

2 - 5 ° 

5 

67 

2-94 

II 

2 

6 

2~ 8 

3 

23 

2 - 30 

4 

61 

2 - 83 

ICEPOLE 

I 

5 

38 

2 - 55.08 

6 

104 

2 ~ 126.32 

II 

4 

22 

2 ~ 30.42 

5 

38 

2 — 59.49 

III 

3 

10 

2 — 16.66 

4 

22 

2 - 4 3 -25 

5 

42 

2 - 87.08 

Minalpher 

I 

4 

22 

2 - 2 3 

5 

41 

2-42 

6 

58 

2-62 

PR0ST-256 

I 

4 

25 

2-26 

5 

41 

2-42 

6 

105 

2 -107 

7 

169 

2- 175 


Last S-box layer inverted. 
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We focus our analysis on PR0ST-256 (formally offering 128-bit security). Like 
Minalpher, PR0ST comes with a MILP-based proof for the minimum number of 
active S-boxes for differential and linear characteristics. For PR0ST-256, the 
bounds for different round numbers are 25 (r = 4), 41 (r = 5), 105 (r = 6), and 
169 (r = 7). 


Results for Pr0st. Again, we used the existing bounds as benchmarks for 
our linear tool. The tool is able to match each bound, mostly with optimal or 
near-optimal bias (2 -26 for r — 4, 2 -42 for r = 5, 2 -107 for r = 6, and 2 -175 for 
r = 7). 

5 Conclusion 

We presented a dedicated tool for the automatic linear cryptanalysis of substi- 
tution-permutation networks. The goal of the tool is to identify linear charac- 
teristics for a cryptographic function, which can subsequently be used by the 
cryptanalyst to mount key-recovery or distinguishing attacks. The heuristic 
search is based on an efficient guess-and-determine approach, which has pre- 
viously been proven successful for searching differential characteristics. We 
described how to perform efficient propagation of linear masks in linear and 
non-linear building blocks of a cipher. 

From the cryptanalyst’s perspective, the tool is simple to use, flexible, and 
easy to extend with regard to search strategies and target ciphers. The open- 
source tool will be freely available to help analyze CAESAR candidates and other 
symmetric cryptographic primitives. We hope that our work will be a valuable 
contribution to get a better understanding of the security of these ciphers regard- 
ing linear cryptanalysis. In particular, we hope to encourage experiments with 
alternative, sophisticated search strategies optimized for different target ciphers. 

We demonstrated the efficiency of our tool by applying it to several CAESAR 
candidates. The results obtained by searching for linear characteristics for the 
Minalpher and PR0ST-256 permutation show that the presented heuristic search 
tool can keep pace with MILP-based approaches. However, due to the heuristic 
nature, we are not capable of providing bounds on the minimum number of 
active S-boxes. 

On the other side, when looking at the results obtained for Ascon, ICEPOLE 
and Keyak- all designs with weak alignment - we have been able to find new 
linear characteristics with a good bias that might be used in a key-recovery or 
distinguishing attack on round-reduced versions of the ciphers in the future. 
One highlight are the Type-III characteristics for round-reduced versions of 
ICEPOLE, which can be used to distinguish the key-stream of ICEPOLE in 
a nonce-respecting scenario. 

Our results show that the existence of a publicly available analysis tool for lin- 
ear characteristics will be of great help in the design of symmetric cryptographic 
primitives, in order to evaluate the resistance against linear attacks already in an 
early stage of the design. Thus, we think that this tool will facilitate new designs 
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which are more balanced in their resistance against linear and differential attacks 
than some of today’s designs. 
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Abstract. In this paper we study authenticated encryption algorithms 
inspired by the OCB mode (Offset Codebook) . These algorithms use secret 
offsets (masks derived from a whitening key) to turn a block cipher into a 
tweakable block cipher, following the XE or XEX construction. 

OCB has a security proof up to 2 n ^ 2 queries, and a matching forgery 
attack was described by Ferguson, where the main step of the attack recov- 
ers the whitening key. In this work we study recent authenticated encryp- 
tion algorithms inspired by OCB, such as Marble, AEZ, and COPA. While 
Ferguson’s attack is not applicable to those algorithms, we show that it is 
still possible to recover the secret mask with birthday complexity. Recov- 
ering the secret mask easily leads to a forgery attack, but it also leads to 
more devastating attacks, with a key-recovery attack against Marble and 
AEZ v2 and v3 with birthday complexity. 

For Marble, this clearly violates the security claims of full n-bit secu- 
rity. For AEZ, this matches the security proof, but we believe it is nonethe- 
less a quite undesirable property that collision attacks allow to recover the 
master key, and more robust designs would be desirable. 

Our attack against AEZ is generic and independent of the internal 
permutation (in particular, it still works with the full AES), but the key- 
recovery is specific to the key derivation used in AEZ v2 and v3. Against 
Marble, the forgery attack is generic, but the key-recovery exploits the 
structure of the E permutation (4 AES rounds). In particular, we intro- 
duce a novel cryptanalytic method to attack 3 AES rounds followed by 3 
inverse AES rounds, which can be of independent interest. 


Keywords: CAESAR competition • Authenticated encryption • Crypt- 
analysis • Marble • AEZ • PMAC • Forgery • Key-recovery 


1 Introduction 

The purpose of an Authenticated Encryption scheme is to provide both privacy 
and integrity with a single cryptographic algorithm. In 2014, the CAESAR com- 
petition was launched with the goal to identify good Authenticated Encryption 
schemes as better alternatives to current options such as AES-GCM [14]. 57 
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candidates have been submitted to the CAESAR competition, and they must 
now be analyzed carefully. 

In this paper, we provide a security analysis of the AES-based candidates 
Marble [5] and AEZ v3 [7]. Both designs are inspired by OCB [16], designed 
in 2001 by Rogaway, Bellare, Black, and Krovetz. They are built as modes of 
operation of a block cipher 1 , using secret offsets at the input and/or output of 
the block cipher calls. 


OCB. In OCB, a whitening key L is derived from the master key IT, and the 
i-th message block Mi is enciphered to C* = Ex(Mi ® 7$ • L) ® 7$ • L, where 7* 
is a (Gray) counter, • is a finite field multiplication, and 7* • L is the i-th offset. 
This design principle was later formalized as the XE and XEX construction [15], 
and proved to turn efficiently a secure block cipher into a secure tweakable block 
cipher [12]. OCB with a an n-bit block cipher is proven secure up to 2 n / 2 queries, 
and Ferguson showed a collision attack matching the bound [3] . The attack uses 
a long message M, so that there is a collision between two block cipher inputs: 


Mi ® 7i • L = Mj ® 7 j • L 


The collision can be detected because Mi ® Ci = Mj ® Cj , and the value of L 
is recovered as (7 i ® 7 j) _1 • (Mi ® Mj). When L is known, it is easy to forge 
messages. 


Marble. Marble [5] is a CAESAR candidate by Jian Guo inspired by COPA [1]. 
COPA was designed in 2013, and combines OCB’s offsets with an internal depen- 
dency chain in order to achieve some security in the case of nonce repetition. 
Marble uses two internal chains in order to prevent birthday attacks on the 
internal chain, and uses reduced-round AES as building blocks. Marble claims 
security against nonce-repetition, and against release of unverified plaintexts, 
but cannot hide common prefixes in case of nonce reuse (Marble is online). 
As opposed to most CAESAR candidates, Marble claims full 128-bit security 
(beyond the birthday bound). The structure of Marble can be seen in Fig. 2. 

Results presented so far on Marble include a cipher-text distinguisher with 
complexity 2 64 , similar to the distinguisher against the counter mode [17]. 


AEZ. AEZ is a CAESAR candidate designed by Hoang, Krovetz, and Rogaway. 
The authors define the security notion of Robust AE , which is the optimal secu- 
rity achievable when nonces are repeated, and unverified plaintexts are released. 
AEZ is claimed to achieve this security notion. In this paper, we focus on the 
current version of AEZ, AEZ v3, as proposed on the crypto-competition mail- 
ing list, and presented at DIAC [7]. AEZ v3 has also been accepted at Euro- 
crypt 2015, and presented as one of the honorable mentions for the best papers 

1 For efficiency reasons, Marble and AEZ actually use 4-rounds of AES rather than a 

full block cipher. 
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award [8]. Our result can also be applied to AEZ v2, but not to AEZ vl, because 
of a different key expansion. 

As far as we are aware, no cryptanalysis of AEZ as been published so far. 


Our Results. In this paper, we describe generic collision attacks against Marble 
and AEZ, allowing to recover the whitening key with about 2 n / 2 chosen message 
queries. When the whitening key is known the security offered by Marble and 
AEZ crumbles and we show a forgery attack using a single extra encryption 
query. Moreover, we extend this result to key-recovery attacks using properties 
of the internal permutations and/or the key scheduling. 

Our results are summarized in Tables 1 and 2. The data complexity is listed 
in number of message blocks (16 bytes). We now detail our results on each 
Authenticated-Encryption with Associated-Data (AEAD) scheme. 

Marble. Our attack against Marble uses queries with repeated nonces, which 
should be secure according to the security claims of Marble. Since Marble claims 
security beyond the birthday bound (allowing up to 2 n block of data), the forgery 
attack using collisions clearly violates the security claims. In addition, we show 
that if unverified plaintexts are released, i.e. if we can obtain plaintexts from 
ciphertexts without a valid tag, then we can further recover the master key K. 
For this attack, we build special queries so that only 3 forward AES rounds and 
3 backwards AES rounds are active, and we develop a novel method to attack 
such a reduced cipher with only known plaintext /ciphertexts. Our attack can be 
build upon two different distinguishers. the first one is based on the detection of 
collision events, and the second one on a statistical property. In both cases, our 
attack requires about 2 33 queries and its time complexity is 2 64 ; we believe this 
result is also of independent interest. 

Following the disclosure of this attack, Guo proposed a minor modification 
of the specifications of Marble as version 1.2 [6]. However, our attack is still 
applicable to the modified version, as shown independently by ourselves and 
Lu [13]. Guo later decided to withdraw Marble from the CAESAR competition. 

AEZ. Our analysis of AEZ v3 focuses on the processing of Associated Data. 
In particular, if AEZ is used with an empty message and no nonce, it turns into 
a variant of PM AC, and the security notion of Robust AE becomes the usual 
MAC security notion. We show how to recover the whitening key of this variant 
of PMAC with a collision attack (a collision attack is also possible against the 
standard PMAC, e.g. following [11]). More importantly, the key derivation of 
PMAC allows to recover the master key K from the whitening key. 

This attack does not violate the security proof, but matches the security bound. 
However, collision attacks usually have a more limited impact (e.g. only affecting 
authenticity), and it seems quite unfortunate that a collision attack leads to a key- 
recovery. This property should probably be avoided when possible 2 . 

2 In AEZ i>4, for the second round of the competition, the designers took into account 
our result and modified the key derivation in order to prevent this property. 
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Table 1 . Our results against Marble. 


Attack (Sec. claim) 

Data 

Time 

Recover L 

2 65 x 2 CP 

2 64 

Forgery (2 128 ) 
Key-recovery (2 128 ): 

2 65 x 2 CP 

2 64 

Collision-based a 

2 65 x 2 CP + 2 32 ’ 6 

x 130 CC 2 64 

Collision-based 

2 65 x 2 CP + 2 33 x 

130 CC 2 64 


a The chosen ciphertexts use the decryption-misuse model. 


Table 2. Our results against AEZ. 


Attack Data a Time Success probability 

Key-recovery 2 66 ' 6 1 1 

Key-recovery 2 44 1 2 -45 ' 2 

a The AEZ specification requires to rotate the 

key after 2 44 blocks 


COP A. After the release of an early version of this paper, Lu applied the same 
techniques to COPA, and described an attack to recover the whitening key [13]. 
The main attack in this paper actually targets the associated data processing, 
which uses PM AC, and can be applied to PM AC. However, the impact of this 
result is unclear because COPA and PMAC do not claim security beyond the 
birthday bound, and this attack cannot be turned into a key-recovery attack. 


Outline of the Paper. Since our collision attack on AEZ is much simpler 
than the attack against Marble, we first explain it in Sect. 2. Then we give a 
short description of the Marble authenticated encryption algorithm in Sect. 3. 
In Sect. 4, we show how to recover the whitening key L and describe our forgery 
attack. Finally, we demonstrate in Sect. 5 how to recover the encryption key K 
from decryption-misuse queries. 

2 Collision Attack Against AEZ 

We first explain the collision attack on AEZ and the resulting key-recovering 
attack. 


2.1 Short Description of AEZ 

For simplicity, we consider AEZ used with only associated data, without any 
nonce or message (the attack can easily be applied with a fixed nonce and mes- 
sage if required). In this case, AEZ turns into a variant of PMAC, and the 
security claim becomes the usual MAC security definition. A particularity of 
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AEZ is that it allows a vector- valued input, i.e. it can authenticate a sequence 
of strings rather than a single string. 

More precisely, the MAC is computed as follow: 

- The key derivation algorithm generates keys fco , and whitening keys J, L 

- Full data blocks Aj of the j- th string (indexed from 1) are processed as: 

x{ = E ko (a* © (i mod 8) • J © 2^- 1 )/ 8 J • L © 8j ■ j) 

- If the last block is partial, it is enciphered as: 

Xj = E ko (Aj © 87 • J) 

- The first block to be processed is the ciphertext extension r (corresponding 
to the tag length). It is r = 128 by default. 

- The tag is computed as E/ Cl (® • j Xj) 

where E is a full or reduced-round AES. This is illustrated by Fig. 1. 



L 

26 J 


Fig. 1 . AEZ used as a MAC (no message, no nonce, two AD strings). 


2.2 Collision Attack on AEZ 

In order to mount a collision attack against AEZ, we consider two sets of mes- 
sages, with C a fixed block: 

- A = {A x | x e {0 ... 2 64 - 1}}, with A x = (r; C ; (C || [a:] || 0 64 )) 

~ & ={B y \ y€{ 0 ... 2 64 - 1}}, with By = ( r ; (C || 0 64 || [y]); C) 

All message are made of two separate strings; message in A have a string of one 
block and a string of two blocks, while messages in B have a string of two blocks 
and a string of one block. In particular, we have: 
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— MAC (AJ = Eki (^E k 0 (t ® L ® 97) ® E^ 0 (C ® L ® 17.7) 

0^/cq ( C ®L® 25J) 0 7/^ 0 (([x] || 0 64 ) 0 7/0 267)^ 

— MAC (.By) = Eki (^Ek 0 (t 0 L 0 97) 0 E^ 0 ( C 0 7/0 177) 

0T7&Q ((0 64 || [ y ]) 0 7/0 187) 0 77& 0 ((7 0 7/0 257)^ 

This leads to a simple collision attack: MAC (A*,) = MAC (By) if and only if 
[x\ || [y\ = 8 • 7 (where 8 = 18 0 26). With A and B of size 2 64 as defined above, 
there is exactly one collision, and the collision immediately reveals the value of 
7 = 8 -1 • ([x\ || [y]). 

Key Recovery. Surprisingly, the key derivation of AEZ allows to recover the 
master key K from the whitening key 7. More precisely, if the master key K is 
of length 128 bits or smaller, 7 is an encryption of K under a known constant 
C: 7 = AES4c(7C). This can easily be inverted: K = AES4^ 1 (7). We note that 
this is not the case in PM AC, where the whitening key is an encryption of 0 
under the secret key K: L = AESx(O). 

This attack matches the security proof of AEZ and does not violate the 
security claims. However, a complete break of AEZ after the birthday bound is 
not expected: most schemes with birthday-bound security are more resilient and 
collision attacks don’t allow key-recovery. 

It should be mentioned that the Eurocrypt version of AEZ does not explicitly 
specify a key derivation algorithm, and leaves it as an open problem: 

“The key K £ Byte* is mapped to three 16-byte subkeys (7, 7, L ) using 
the key-derivation function (KDF) named Extract that is called at line 
401. The definition of Extract is omitted from the figures and regarded 
as orthogonal to the rest of AEZ. See the AEZ spec for the current 
Extract: Byte* —> Byte 48 . In our view, it is an unresolved matter 
what the security properties (and even what signature) of a good KDF 
should be. Work has gone off in very different directions, and the area is 
currently the subject of a Password Hashing Competition (PHC) running 
concurrently with CAESAR.” 

Clearly, the key derivation of the AEZ v3 specification does not have the security 
properties of a good KDF. 

Data Limit. The AEZ specification requires users to change the key after 
encrypting 2 48 bytes, i.e. 2 44 blocks. This prevents the attack as described above. 
However, we can perform the attack with smaller sets A and B of size 2 41,4 , with 
a success probability of 2 -45,2 . This is still much more efficient than generic 
attacks: with a time complexity of 2 44 , a brute- force key search only succeeds 
with a probability of 2 -84 . 
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3 Description of Marble 

Marble is an authenticated encryption algorithm designed by Guo [5] with key- 
length and tag- length of both 128 bits. A plaintext and its associated data are 
divided into blocks of 128 bits and are then proceeded consecutively. Its internal 
permutation is based on a modified version of the AES block cipher. Unlike other 
authenticated encryption algorithms, Marble does not require a nonce. 

Marble has very strong security claims: it claims to offer full 128-bit security, 
i.e. an attack should take time T = 2 128 even after large amount of data have 
been encrypted with the same key (up to D = 2 128 ). This is in contrast to many 
CAESAR candidates and classical modes of operations for block ciphers (e.g. 
GCM), which only offer a birthday level of security, i.e. the ciphers are secure 
as long as T • D < 2 128 . 

In addition, Marble does not use nonces, and the security claim even holds if 
unverified plaintexts are released, i.e. the adversary can request the decryption 
of a ciphertext C without knowing a valid tag corresponding to C (decryption- 
misuse oracle). A few other CAESAR candidate allow the release of unverified 
plaintext (AEZ, POET, APE, Minalpher), but they only claim birthday security. 

An overview of Marble is depicted in Fig. 2 . The Marble mode of operation 
makes use of two 128-bit chaining variables s\ and 52 , initialized with constants 
consti and const2 . The associated data and the plaintext are padded indepen- 
dently, so both resulting fields A and P can be divided into 128 bit blocks. We 
do not describe the padding function here, as it does not affect our attacks. We 
will denote a message to encrypt by (AD || M), where AD is a vector containing 
Ia 128-bit blocks of associated data and M is a vector containing Im 128-bit 
blocks of plaintexts. 

The internal primitive used is a modified block cipher, as intermediate val- 
ues of the block are combined with the incoming chaining variables. Formally, 
the primitive uses 3 internal keyed permutations Pi, E 2 and P 3 and processes 
128-bit blocks as follows. On input (P, 51 , 52 ), (C, 5^,5 f 2 ) is defined as 


X = E 1k {P) 

(AVi) = (3X + Sl ,x + Sl ) 

Y = E 2K (X') 

(Y',s' 2 ) = (3Y + s 2 ,Y + S2) 

C = E 3k (Y') 

Note that additions and multiplications are performed in the binary Galois 
Field IF 2 i 28 defined by the primitive polynomial x 128 +x 7 +x 2 +x+l. Furthermore, 
polynomials Ea^X 1 are denoted by the integers Daft 1 . Therefore, please note 
that additions and multiplications on such objects have to be interpreted as 
operations in the binary field (and not on the integer ring) and have to be 
handled carefully. 

In the case of Marble, each one of the three permutations Pi, E 2 and P 3 , 
is composed with 4 full-round of AES (i.e. SubByte, ShiftRow, MixColumn and 
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Add Round Key). One can notice that no key addition is performed at the begin- 
ning of those permutations. 12 subkeys are therefore required. A 128-bit master 
key K is derived into 11 subkeys using the AES- 128 key-schedule algorithm. 
The master key itself is used as the 12th subkey. For more information about 
the AES block cipher, we refer to [2]. 

The Marble encryption then works as follows. First, a mask L is derived 
from the key K by encrypting a constant consts (which also sets key-dependent 
.Si [0] , «S 2 [0] ) • For each associated data block Ai (i > 1), a pre-whitening key is 
defined as 2 2_1 3 2 L. For each plaintext block M$, a pre- whitening key and a 
post-whitening key are defined as 2 l L and 2 2_1 3 L. These blocks are processed 
iteratively, starting with associated data, as follows: 

1. Addition (i.e. xor) of the pre-whitening key; 

2. Application of the internal primitive; 

3. For plaintext blocks, application of the post whitening key, which results in 
ciphertext blocks. 

Finally, the tag is computed by encrypting an extra block defined as the 
sum of all plaintext blocks and all encrypted additional data blocks, with pre- 
whitening key 2^ M 7 L and post- whitening key 2^ M ~ 1 3 L. 

4 A Universal Forgery Attack on Marble 

In this section, we first describe a method to find the mask L using about 2 65 
chosen plaintext queries. Then, we use this knowledge to compute forgeries. Our 
attack enables to modify the associated data field in a way that affects neither 
the ciphertext nor the authentication tag. It can therefore be used to compute 
universal forgeries in a chosen plaintext setting. 

4.1 Recover the Mask L 

The main idea of the attack is to build a pair of message M ^ M' so that the 
inputs to the E\ functions are the same for both messages. This is possible if 
M and M' have the same total length, but the associated data and message 
parts have different lengths. When the inputs to E\ collide, all the intermediate 
computations collide, and we can detect this event on the resulting ciphertexts. 
Please note that as different multiples of L are used for post-whitening, this 
operation is more tricky than detecting a collision on ciphertexts. In the following 
we use 2 blocks of AD and 1 block of message for M, but 1 block of AD and 2 
blocks of message for M' . 

More precisely, we encrypt messages M a and M^, for different values a,/? G 
F 2 i 28 , defined as follows (where A E F 2 i 28 is a constant value): 

- M a = (AD[1\,AD[2] || M[ 1]) = (4,8a || 6a); 

- M’ 0 = (AD[ 1] || M'[1],M'[2]) = (A || 8/3,6/?). 


Const 0 AD[ 1] AD[I a Ji M[ 1] M[2] , E,M[*! 
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Fig. 2. General design of Marble. 
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3 x -\- y 


x > x + y x ^ > 2 x + y 


3 x + y 


Fig. 3. The TRANS operation. 

In the following, we consider sets of 2 64 values a and /3 so that a © /3 covers 
all values in F 2 i28. The inputs to the E\ layer will be respectively (we note that 
3 2 = 5 in F 2 i2s): 

x\ = A 0 5L x 2 = 8ft® 10L X 3 

x i — A © 5 L X 2 = 8/3 © 2 L x^ 

In particular, we have: 

x\ © x[ = 0 x 2 ® x ' 2 = 8(a ® f3 ® L) x 3 ® £3 

Therefore, the inputs to E\ collide when a ® /3 = L. 

We denote the output of the E 3 layer as yi (respectively 7/'), and the corre- 
sponding ciphertexts as C a [ 1] (respectively (C^[l], C^[2])). We have: 

C a [l] = 2/3 e 3£ C£[l] = © 3L C£[2] = 2/3 © 6L 

In particular, if a ® /3 = L, we have = x- for i < 3, therefore yi = 7/' for 
i < 3, and C a [ 1] ® C^[2] = 5L (since 3 ® 6 = 5). In order to detect this event 
efficiently we match the set of values {C a [l] © 5a} and {Cp[ 2] ® 5 /3}. When 
a ® /3 = L, we have a match, and we can easily filter false positives using a new 
message pair with a different value of the constant A. The full algorithm is given 
by Algorithm 1, using 2 65 short encryption queries. 


= 6a ® 2L for M a 

= 6/3 © 4L for Mp 

= 6(a ® /3 ® L) 


4.2 An Attack Against Marble 1.2 

After the first release of our attack, Guo made a minor modification to the 
specification of Marble [6]. Namely, the input mask for the last block of associated 
data is changed from 2 2_1 3 2 L to 2 2_1 3 3 L. Our attacks can be adapted as follows. 

The adversary needs to query an encryption oracle for messages M a and M'p , 
defined as 

- M a = ( AD [1], AD [2] || M[ 1]) = (10a, 28a || 6a); 

- M' = (AD[ 1] || M'[1],M'[2]) = (10 (3 || 28/3,6/?). 

Using the notations of Sect. 4.1, the inputs to the E\ layer will be : 


x\ = 10a ® 5 L X 2 = 28a © 30L 

x\ = 10/3 © 15 L x' 2 = 28/5 © 2 L 


for M a 
for Mp 


x^ = 6a © 2 L 
x 0 — 6/3 © 4 L 
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Algorithm 1. Recover L from an encryption oracle £. 

H ^ 0 

for a £ {0, 1, ... , 2 64 — 1} do 
(C[l] | T) £(0, 8a || 6a) 

H{C[1\ © 5a} <- a 

end for 

for p e {0, 2 64 , . . . , 2 128 - 2 64 } do 
(C"[l], C"[2] || T) <— £(0 || 8 P,6P) 
if H{C'[2] ® 5/3} exists then 

0 ^ H{C'[ 2] ©5/3} 

(D[ 1] T) 8(1, 8a || 6a) 

(D'[l\, D'[2] || T) <— 8(1 || 8/3, 6/3) 
if D[ 1] ® 5a = D'[ 2] ® 5/3 then 

> H is a hash table 

return a ® /3 


end if 


end if 


end for 



In particular, we have: 


X\ ® x i — 10 • (a ® (3 ® T), 

X 2 ® x ' 2 = 28 • (a ® /3 ® L), 

X 3 ® x ' 3 = 6 • (a ® /3 ® L). 

If for some (a, /3), a ® /3 = L, then = x- for i = 1, 2, 3. Then, the outputs of 
E% verify y 1 = y[, y 2 = 2/2 an d 2/3 = 2/3 and therefore, C a [ 1] ® 3L = Cp[ 2] ® 6L. 
As 3 ® 6 = 5, This can also be expressed as: 

C a [l] ® 5a = C^[2] ® 5/3. 

Therefore, L has to be searched among the values (a®/?) for which this relation 
holds. As for our attack on the previous version of Marble, about 2 64 different 
values of both a and /3 are required. 


4.3 Computing Forgeries on Marble Without Whitening Keys 

Once we have retrieved L, we can consider a simplified description of Marble 
where the masks are removed, as depicted in Fig. 4. In its mask-less descrip- 
tion, Marble possesses an interesting property as described in Fig. 5: a series of 
identical input blocks X has a periodic effect on the internal state. 

Indeed, if we let E\{X) = u , E 2 ( 3 SiQu) = v and Z^2(35 'i® 2^) = re, it is easy 
to see that after encrypting 4 blocks X, the internal states Si and S 2 remain 
unchanged. Furthermore, if we use a series of 8 consecutive identical associated 
data blocks X, the effect on r also cancels out. This leads to a universal forgery 
attack: for any associated data AD and plaintext M, the adversary computes 
the masked value B of a chunk of 8 identical blocks of associated data after AD 
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Si 


S 2 



Fig. 4. Mask-less description of Marble. Si and S 2 are unknown key-dependent values. 

and queries the encryption oracle on ((A, B) || M). The answer (C || T) to that 
query is also a valid ciphertext for (AD || M), therefore the adversary can return 
(C || T) as a forgery. The attack is given as Algorithm 2. 


Algorithm 2 . Compute the ciphertext ( C || T) from (AD || M ) using 
known L. 

B«-(2«.3 a .L)|£ 7 

( C || T) <— A£k((AD, B) || M) > Encryption oracle call 

return (C || T) 


5 Key-Recovery Attack 

We now show how to recover the master key once the mask L has been deter- 
mined. In order to simplify the description of the attack, we now focus on the 
mask-less variant of Marble; however the full attack can easily be adapted to the 
full version of Marble with a known mask. 

The main idea is to collect pairs of messages with a fixed difference in some 
internal state variables. This will allow us to attack a reduced cipher composed 
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X X X X 



T 


Fig. 5. Collision on the internal state of the associated data. 

by 4 AES rounds followed by 4 inverse AES rounds rather than a 12-round AES 
(see details below). Moreover, we apply this strategy to Ei rather that to E% 
because the whitening key of E\ is directly derived from L. Since L is known, 
the first AES round of E\ is key- independent. Therefore we can peel it off, and 
attack only 3 forward rounds and 3 inverse rounds. However, this requires us to 
use decryption queries, but we can’t forge valid tags for an arbitrary ciphertext 
yet, so we use a decryption- misuse oracle. 

5.1 Gathering Pairs 

The first step is to collect pairs of plaintext blocks that have the same difference 
in the Si lane (after the permutation Ei). In order to construct such plaintexts, 
we build pairs of ciphertexts with specific differences and values. More precisely, 
we consider pairs of messages as follows (with the same associated data AD): 

C x = ( AD || 0,0 12 °, 0,0, 0,0, 0,0, 0,0, a;) C x [i] = C x [i] © 2 i “ 1 -3 L 

C' x = {AD || 1,0 120 , 1,0, 0,0, 0,1, 1,1,*) C'Jft = C' x [i\ © 2 i_1 • 3 • L 

where 0 and 1 are constant one-block values and x takes a different value for 
each pair. We decrypt these pairs and we collect the final plaintext blocks. 
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We now study the differences in the S'2 lane (before the permutation £3). 
Using the definition of the TRANS operation as given in Fig. 3, S 2 is updated as 
follows during decryption: 

S 2 [i+ 1] =2-S 2 [i]®£ 3 - 1 (C[i]) 

With the messages C x and C' x . we have 
S 2 [129] = 2 129 S' 2 [0] © (1 © 2 ® • • • ® 2 128 )A, 

S' 2 [129] = 2 129 S 2 [0] © (1 © 2 © • • • © 2 128 ) A © (2° © 2 1 © 2 2 © 2 7 © 2 128 ) (A © B), 

where A = Pp ( 0) and B = E^ 1 ( 1). Since 2 128 = 2° © 2 1 © 2 2 © 2 7 , we have 
S'2 [129] = ^2 [129] . This is shown in Fig. 6, where 5 = A © B. 


P*[130]©P£[130] 


2 127 <5 © 2 6 S © 26 © 6 



0®1 x 0 x 


Fig. 6. Difference propagation in decryption. A red arrow means that there is a fixed 
unknown difference. A black arrow means that the difference is null. 

We now consider the final plaintext block given by the decryption oracle. 

-Pc [130] = Pa; [130] © 2 130 • L 

= Pf 1 (P 2 _1 (E^\x) © 3 • S’2 [129] ) © 3 • Si [129]) 

P.' [130] = P^[130] © 2 130 • L 

= Pf 1 (P 2 _1 (E^\x) © 3 • S’2 [129] ) © 3 • S[ [129]) 

With U x = E^ 1 ( y E^ 1 (x) © 3 • S 2 [129]), we have 
Pc [130] = E^ l (XJ x ) © 3 • Si [129] 

P' x [ 130] = E^(JJ X ) © 3 • Si [129] 


524 


T. Fuhr et al. 


Therefore, the pair P x [130], P' x [130] can be seen as a plaintext /ciphertext pair 
for a cipher with 4 AES rounds, a middle key Si [129] 0 Si [129], and 4 inverse 
AES rounds: 


Pa; [130] 


E 1 


Pi [129] 

A- 


Si [129] 


-U x 


1 

V 


\V 



P'[130] 


In addition, we can peel off the outer rounds since there is no whitening key 
in Ei. 


5.2 Extracting the Key 

We must now extract the key of a reduced cipher with 3 AES rounds, and 3 
inverse AES rounds. First, we notice that the middle ShiftRow and MixCol- 
umn operations can be removed, if we transform the middle key. In a basic 
description, the operations in the middle are ShiftRow, MixColumn, AddKey, then 
XORing the constant Si [129] ® Si[129], AddKey, InverseMixColumn, and Inverse- 
Shift Row. Instead we move the (unknown) constant addition before ShiftRow, 
using the linearity of ShiftRow and MixColumn, so that ShiftRow, MixColumn 
and AddKey cancel out with AddKey, InverseMixColumn and InverseShiftRow. We 
denote the addition of the modified constant as AddDeltaS, and its value as 
5s = lnverseShiftRow(lnverseMixColumn(Si[129] ® Si[129])). The middle rounds 
are then reduced to byte-wise operations: AddRoundKey, SubByte, AddDeltaS, 
InverseSubByte, AddRoundKey. This can be seen as a key-dependent Sbox layer. 
These transformations are summarised on Fig. 7. 



distinguisher 


Fig. 7. Reduced cipher composed of 3 AES rounds, the addition of 5s and 3 inverse 
AES rounds. The distinguisher covers the middle part of this cipher. 


The first step of our attack is to guess a diagonal of the first round key, which 
allows to compute a column after the first round and before the last round. Next 
we focus on the middle rounds. The middle rounds have only one MixColumn 
operation, and one InverseMixColumn without byte shuffling in between. There- 
fore it can be seen as four parallel 32-bit functions, acting on each diagonal 
(similar to the Super-SBox technique [4]). Note that if the key guess is wrong, 
the resulting function can not be decomposed into 4 parallel functions. For each 
function, 1 input byte and 1 output byte are known. We describe below two 
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different distinguishes for the middle rounds, that lead to key recovery attack 
with similar complexities. The first one is based on a rare event that can easily 
be detected, the second one relies on the detection of a statistical bias in the 
generic case. 


Collision-based Distinguisher. For our first distinguisher, we focus on the 
constant 5s- We only know that 5s is non-zero on the full state. Considering that 
it is distributed uniformly on non-zero constants, it cancels one of the diagonals 3 
with probability (2 96 — l)/(2 128 — 1) « 2 -32 . Then, an average of 2 30 different 
choices of AD are necessary to reach a value of 5s that cancels on one of the 
diagonals. Let us consider that it occurs on the first diagonal (w.l.o.g.), which 
contains bytes 0, 7, 10, 13. Then, the value of these bytes collide before and after 
the Add DeltaS operation. Then, the values of the first column of the block (bytes 
0, 1,2,3) are not affected by the middle rounds. If we continue the decryption 
process towards both ends of the modified version of the AES, the collision passes 
through the InverseMixColumn operation. After undoing the ShiftRow, SubByte 
and textsfAddKey operation, we notice that the values of bytes 0, 5, 10, 15 are 
equal at the beginning and at the end of the middle part of the cipher. 

For each choice of AD , we then generate 3 (plaintext-ciphertext) pairs (P X ,P^.) 
for 3 values of x. Then, we proceed as follows. 

In each of the 2 30 sets, we guess separately the 32 bits on each of the 4 anti- 
diagonals 4 of the first round key. This enables to compute one full column of the 
state before and after the middle rounds, for each value of x. For each byte bi 
in this column, we store a list Li of the key values such that the input byte and 
the output byte of the middle rounds are equal for each x. 

Then we consider the first diagonal before and after the middle rounds. 
It contains bytes 0, 5, 10 and 15 of the block. Remember that the diagonals 
contain the inputs and outputs of 4 independent functions F{. From the 4 lists 
of partial keys Lj , j = 0, 5, 10, 15, we can build all the keys such that the input 
of Fi collides with the output for each value of x. Using the known plaintexts 
and ciphertexts for the full cipher, we can try all these keys as candidates. Then, 
we repeat the whole process with the other three diagonals. 

We now explain why this attack works. 

Filtering Keys. Following the analysis above, the right key can be decomposed 
into 4 partial keys covering each diagonal of the block. If 5s cancels on one 
of the columns, then the partial values of the right key will appear on the four 
corresponding lists L^, and the full key will be among the combination of elements 
of the four lists. Therefore, the right key will be detected by our algorithm. 

False Positives. For each wrong partial 32-bit key, it is stored in the correspond- 
ing list Lj if the input and output of Fi collide on byte j, for each of the 3 values 
of x. This occurs with probability 2 -24 , if we consider the input and output 

3 defined as the images of columns by the ShiftRow operation. 

4 defined as the pre-images of columns by the ShiftRow operation. 
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byte computed for F{ as independent. Therefore, we have on average one false 
positive of each of the 4 diagonals of the key. Considering that the number of 
false positives are independent for each diagonal of the key, there are on average 
(2 8 ) 4 = 2 32 keys to try, for each of the 4 diagonals and each of the 2 30 sets 
of values. The expected number of key candidates is marginally increased to 
(2 8 + l) 4 « 2 32 when the difference 5 s cancels on the diagonal, as each set of 
partial keys at least contains the right key. 

5s ^ 0 on column i. As above, each wrong key guess is stored with probabil- 
ity 2 -24 , which leads to (2 8 ) 4 false positives on average, that are discarded by 
exhaustive search. 

Summary of the Attack. We focus on the key recovery attack on the mask-less 
version of Marble. In the decryption-misuse setting, it requires the decryption of 
6 x 2 30 ciphertexts composed of 130 blocks of plaintext and 1 block of associated 
data, which correspond to 2 30 sets of 3 pairs. To build the lists of partial keys, 
one has to perform 6 x 1/4 of an AES round for each partial key guess, leading 
to a total of 3 x 2 31 AES rounds, for each set and each diagonal. The average 
complexity of this step for the full attack is then 3 x 2 63 AES rounds. The most 
time-consuming part of the attack is the exhaustive search among the remaining 
candidates, which requires 2 64 AES encryptions on average (2 32 per column and 
per set). 


Linear Cryptanalysis. The method described in Sect. 5.1 leads to the knowl- 
edge of plaintext-ciphertext pairs for a cipher that consists of 3 AES rounds, a 
key addition and 3 inverse AES rounds. The adversary therefore targets a cipher 
with a reduced number of rounds, in a known plaintext setting. Using linear 
cryptanalysis therefore seems a natural idea. As shown above, one can guess 4 
key bytes, which leads to the knowledge of 4 input and 4 output bytes of the 
inner 4 rounds of this cipher. 

In [9] , Keliher and Sui demonstrate that the maximum expected linear prob- 
ability over 2 AES rounds is about LP « 1.638 x 2 - 28 . In our case, we can 
concatenate a linear trail with its inverse. When averaging over the possible 
values of the key and of the intermediate difference 5s, the maximum expected 
probability for a 4-round characteristic would be about LP 2 ~ 1.342 x 2 -55 . 
This number also gives an estimation of the amount of data required for the 
attack to work. Even by taking into account the possible bias due to the linear 
hull effect, the complexity of the linear attack is expected to be far higher than 
the one suggested by the experiments below. 

A refinement of the linear attack consists in noticing that between the two 
middle rounds, each byte of the block is affected only by a key byte and a byte 
of 5s, but not by other bytes of the block. Therefore, the two middle Sbox layers 
could be expressed as one layer of 8-bit key-dependent Sboxes, leading to trails 
with 6 active Sboxes instead of 10. Nevertheless, the best linear trail would 
then depend on the unknown value of 5s, which would make it hard to exploit. 
Instead, we use the following statistical distinguisher. 
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Statistical Distinguisher. Intuitively, if we have many partial input/output 
pairs, we should detect some correlation between the inputs and output. Indeed, 
when the key guess is wrong, the function composing the distinguisher behaves as 
a 128-bit permutation instead of the parallel application of four 32-bit functions. 
Hence, the input and output bytes are less correlated. We focus on a property 
that does not require to know in advance which values are correlated, and works 
for any function based on (four 32-bit) parallel permutations. 

For each known plaintext /ciphertext, we partially encrypt /decrypt one round 
on a specific diagonal and we denote one known input/output byte of the distin- 
guisher by (a, /?) respectively. It is possible to take into account the four known 
input /output pairs, but the distinguisher presented below works with only one 
position and is easier to explain. We use 2 16 counters c a ^ to count how many 
times each pair (a,/?) occurs with D available data. If the key guess is correct, 
there should be some correlation between a and /?, which results in a higher 
value for some counters (and lower values for the other counters). In order to 
detect this effect, we compute the sample variance s 2 of the 2 16 counters: 

S 2 = 2 -16 y^(c ai/3 - c) 2 , where c = 2 -16 ^ c a4 3. 

a,/3 a,/3 

We expect that s 2 is higher when the key guess is correct, because of the corre- 
lation between a and /3. For a wrong key guess, the computation between a and 
/? can not be decomposed into 4 parallel functions, and this correlation should 
vanish. The resulting attack is described by Algorithm 3. 


Algorithm 3. Recover the key of a reduced AES (3 direct rounds and 3 inverse 
rounds) 

Input: Plaint ext /ciphertext pairs (P, (7) 

for 0 < K < 2 32 do > Partial key guess 

Initialize c a ,p = 0 
for all (P, C) do 
Compute o, /3 
Cac,(3 < Ccx,/3 T 1 
end for 

C * 2 E a ,/3 

S 2 [P]-2- 16 E. ;/3 (Ca,/3-c) 2 

end for 

return arg max K s 2 [P] 


In order to analyze this algorithm, we model the counters using random 
variables and the sample variance as S 2 for a wrong key guess, and S l 

for the right key. Our goal is to show that when D is large enough, we have 
Pr[P 2 > S' 2 ] negligible, i.e. the correct key is ranked first. 
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Wrong key Guess. We know that starting from a, if we revert the initial round 
with the wrong key, then compute three rounds forward with the correct keys, 
add 5s, compute three round backwards with the correct keys, and finally one 
round forward with the wrong key, we reach a state with /3. Therefore, a and f3 
are partial inputs/outputs of a 128-bit permutation. 

If we model this function by a random 128-bit permutation, the number of 
data matching a pair (a,/?), in images and pre-images of this 128-bit function, 
follows an hyper geometric distribution. Indeed, for each input which first byte 
has value <a, the output is drawn uniformly without replacement among all the 
possible outputs of the function. The success is determined by whether the first 
byte of the output equals /3. The number of draws is 2 120 , and there are 2 120 
success cases among 2 128 possible values. 

(a, /3) occurs with D data, knowing that the probability of success is pi 
2 120 /2 128 = 2 -8 . Let us call this variable X a ,p. Hence we have 

E[X a>/3 ] = (2 120 ) 2 /2 128 = 2 112 

Var^^j = (2 120 ) 2 /2 128 x (1 - 2- 8 ) 2 /(l - 2“ 128 ) « 2 112 - 2 105 . 

Next we study T a?w the number of times each value a, u is reached with D 
samples, for each possible value u of the remaining 15 bytes of the input of F. 
The Y a , u follow a multinomial distribution, with: 

E[Y a , u } = 2 ~ 12S D, 

Var[Y Q) „] = 2- 128 (l - 2~ 128 )D, 

Var \Y a>u ,Y aW ] = -2~ 25e D. 

Let us denote by S a ^ the set of values u such that F(a,u) = (f3,v) for some v. 
It contains exactly X a ^ elements. The counters C a ^ can then be expressed as 

^a,/3 ^ ^ Y a , u . 

The variables Y a , u all follow the same distribution. From the law of total 
variance, we have: 


Var [C aj p] = E Xa ^ 



y ] Yx,u\X 01,13 
ues ar g 


- Var x , v 


y ] Ycx,u \x a ,/3 


ues Q 


After expanding the sums and reordering the terms to make variances and 
covariances of the appear, we get: 

VarlC^] = E [x a>/3 Var[Y«, u ] + (X^ - X aj/3 ) Var [Y a , u , Y a>u ,]] + Var [X a , 0 E[Y a ,*]] 

= E(X aj/3 ) Var (Y a ,„) + E[X 2 p - X a j/3 ] Var(Y a , u , Y a y) + E[Y a , u ] 2 Var[X aj/3 ] 

= E(X a>/3 ) Var(Y a , u ) + (Var[X a , 5 ] + E[X a , p ] 2 - E[X aj/3 ]) Var(Y a , u , Y a<u ,) 

+ E[Y a ,„] 2 Var[X a ^] 


We have numeric expressions for each term of this expression, therefore we 
can compute the following approximation: 
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Var [C a ^\ « 2 ~ 16 D + 2- 144 £> 2 . 

Correct Key. Let us now assume that the key guess is correct, i.e. the pairs (a, /?) 
are valid partial input/output but of a 32-bit function this time. We can then re- 
apply the above analysis by adjusting the parameters to fit the 32-bit function. 
In this case, X a ^ denotes the number of partial values of the data matching 
the pair (a,/?) in the right column. In that case, we have an hypergeometric 
distribution with 2 24 draws without replacement from a set of 2 32 values, among 
which 2 24 define a success event. 

Therefore, we have 

V[X«A = ( 2 24 ) 2 / 2 32 = 2 16 

Var[X a?/3 ] = (2 24 ) 2 /2 32 x (1 - 2- 8 ) 2 /(l - 2~ 32 ) « 2 16 - 2 9 . 

Similarly, we can define variables Y ajU as the number of times a given input 
of the 32-bit function F is reached among D samples, drawn uniformly. As in 
the previous case, the Y a , u follow a multinomial distribution, with: 

E[Ya,u] = 2 ~ 32 D, 

V ar[y Q ,„] = 2- 32 (1 - 2- 32 ) j D, 

Var [y a , u ,y«>'] = -2 ~ 64 D. 

The same formula can be used to compute the variance of the counters C a ^. 
We get approximately: 


Var[C a?/3 ] « 2 ~ 16 D + 2 ~ 48 D 2 . 


Distinguisher. We obtain an efficient distinguisher with D = 2 32 : for a wrong 
key guess, the variance of the counter is about 2 16 , but it is about 2 17 for the 
right key. In order to evaluate the probability that the correct key is ranked 
first, we must evaluate how far the sample variance s 2 is from the true variance 
Var [C at0 \. 

For a wrong key guess, if we use a single counter and repeat the experiment 
with 2 16 independent sets of D plaintexts, each counter C a ^ can be approx- 
imated by a binomial distribution with parameters D and p = 2 -16 . If we 
approximate them as a normal distribution with parameters p = 2 ~ 16 D and 
a = y / 2 _16 (l — 2 16 )D, we know that the distribution of the sample variance S 2 
for a wrong key guess follows a y 2 distribution [10, Proposition 2.11]: 


S 2 


( n — 1) 


An— 1 


',-32 


DX 2 16 — 1 


In particular, we have Pr^ 2 > 2 16 + 2 12 ] < 2 -90 , therefore we don’t expect 
that the sample variance of a wrong key is above 2 16 + 2 12 . In practice, we 
use a single set of D plaintexts, and we evaluate the sample variance of the 2 16 
counters; our experiments show that the distribution is close to a y 2 distribution 
(the maximum value of s 2 with 2 16 samples was 2 16 + 1420). 
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For the right key, we don’t have an analytic expression of the distribution 
of the sample variance, but we can perform experiments. Our experiments show 
that with very high probability S 2 > 2 16 + 2 12 , as seen in Fig. 8. Of our 2 16 
experiments, the minimum value of s 2 was 102795 ~ 2 16 + 2 15 . Using D = 2 32 , 
we have a large margin and we expect the attack to work with significantly less 
data, but recovering L will be the bottleneck of the attack. 

While this attack does not use any property of the parallel 32-bit function, we 
expect that it can be improved in the specific case of AES rounds. In particular, 
we notice a small peak around 3 x 2 16 in Fig. 8, which is due to zero bytes in S s , 
and it should be possible to take advantage of this. 



• 2 14 

Fig. 8. Experimental results: distribution of the sample variance S 2 and Si with D s=s 
2 32 (2 16 experiments with random keys). 


6 Conclusion 

Our results show that collision attacks can have a strong impact on the security 
of authenticated encryption schemes. It seems that extracting the whitening key 


Collision Attacks Against CAESAR Candidates 531 


using collisions is possible in many OCB-based designs, and this can sometimes 
be extended into a full key-recovery attack. 

On AEZ, we show how to recover the whitening key, and we point out that the 
key derivation of AEZ v2 and v3 has the unfortunate property that the master 
key can easily be recovered from the whitening key. This allows a complete break 
after the birthday bound. Even with a limit on the amount of data processed 
with a single key, this still gives an attack with a higher success probability than 
generic attacks. While this does not violate the security proof of AEZ, we believe 
it would be better to avoid this property. 

Our results on Marble show that it does not provide the security features 
claimed by the author, i.e. beyond birthday bound security and decryption- 
misuse resistance. We note that Marble still offers a stronger security than many 
CAESAR candidates and classical modes of operations when using nonces (or 
unique AD). Once usage requirements are relaxed, our results also show that the 
security of Marble is similar to the security of other misuse resistant CAESAR 
candidates (e.g. APE-128, POET, AEZ, Minalpher) but it collapses badly after 
the birthday bound under a decryption-misuse setting, even leading to a full key 
recovery. 

It seems that adding one extra operation on the state between the processing 
of the associated data and of the message would avoid our attacks, but a thorough 
analysis would be necessary to ensure that the resulting construction is secure. 
As our attack heavily relies on the fact that Si and S 2 keep the same values 
on two different plaintexts, one could xor a constant block (for example, 16 
bytes encoding the byte positions in the block, (0, 1, ... , 15)) on Si and S 2 after 
processing the associated data. 

In addition, our key-recovery attack of Marble suggests that 4 AES rounds in 
the E functions are insufficient if the adversary can find a shortcut to target two 
E functions instead of three. In particular, this suggest that a deeper investiga- 
tion of the security of AEZ with a modified key schedule would be interesting. 

Up to our knowledge, the statistical distinguisher presented to recover the 
encryption key of a reduced-round AES, has never been used before. Although 
it permits to attack few rounds, it seems that it is more efficient than a classical 
linear attack. We believe that it is sufficient enough for this kind of distinguisher 
to benefit from further research. 
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Abstract. LowMC is a collection of block cipher families introduced at 
Eurocrypt 2015 by Albrecht et ah Its design is optimized for instanti- 
ations of multi-party computation, fully homomorphic encryption, and 
zero-knowledge proofs. A unique feature of LowMC is that its internal 
affine layers are chosen at random, and thus each block cipher family 
contains a huge number of instances. The Eurocrypt paper proposed 
two specific block cipher families of LowMC, having 80-bit and 128-bit 
keys. 

In this paper, we mount interpolation attacks (algebraic attacks intro- 
duced by Jakobsen and Knudsen) on LowMC, and show that a practically 
significant fraction of 2 _ 38 of its 80-bit key instances could be broken 2 23 
times faster than exhaustive search. Moreover, essentially all instances 
that are claimed to provide 128-bit security could be broken about 1000 
times faster. In order to obtain these results we optimize the interpo- 
lation attack using several new techniques. In particular, we present an 
algorithm that combines two main variants of the interpolation attack, 
and results in an attack which is more efficient than each one. 


Keywords: Block cipher • LowMC • High-order differential 
cryptanalysis • Interpolation attack 


1 Introduction 

LowMC is a collection of block cipher families designed by Albrecht et al. and 
presented at Eurocrypt 2015. The cipher is specifically optimized for practical 
instantiations of multi-party computation, fully homomorphic encryption, and 
zero-knowledge proofs. In such applications, non-linear operations result in a 
heavy computational penalty compared to linear ones. The designers of LowMC 
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took an extreme approach, combining very dense affine layers with simple non- 
linear layers that have algebraic degree of 2. 

Perhaps the most distinctive feature of LowMC is that its affine layers are 
chosen at random, and thus each block cipher family contains a huge number 
of instances. As this may enable a malicious party to instantiate LowMC with 
a hidden backdoor, its designers propose to use the Grain stream cipher [3] as 
a source of pseudo-random bits in order to restrict the freedom available in the 
LowMC instantiation. The designers also mention that it is possible to use any 
sufficiently random source to generate the affine layers, and this source does not 
necessarily need to be cryptographically secure. 

The Eurocrypt paper proposed two specific block cipher families of LowMC, 
having 80-bit and 128-bit keys. The internal number of rounds in each family 
was set in order to guarantee a security level that corresponds to its key size. 
For this purpose, the resistance of LowMC was evaluated against a variety of 
well-known crypt analytic attacks. One of the main considerations in setting the 
internal number of rounds was to provide resistance against algebraic attacks 
(such as high-order differential cryptanalysis [7]). Indeed, LowMC is potentially 
susceptible to algebraic attacks due to the low algebraic degree of its internal 
round, but the designers argue that LowMC has sufficiently many rounds to 
resist such attacks. 

In this paper, we evaluate the resistance of LowMC against algebraic attacks 
and refute the designers’ claims regarding its security level. Our results are given 
in Table 1, and show that a fraction of 2 -38 of the LowMC 80-bit key instances 
could be broken in about 2 57 time, using 2 39 chosen plaintexts. The probability 
of 2“ 38 is practically significant, namely, a malicious party can easily find weak 
instances of LowMC by running its source of pseudo-random bits with sufficiently 
many seeds, and checking whether the resultant instance is weak (which can be 
done efficiently using basic linear algebra). 

For LowMC with 128-bit keys, we describe an attack that breaks a fraction 
of 2“ 122 of its instances in time 2 86 using 2 70 chosen plaintexts. We note that 
this specific attack does not violate the formal security claims of the LowMC 
designers, as they do not consider attacks that apply to less than 2“ 100 of the 
instances as valid. Nevertheless, the designers of LowMC allow to instantiate it 
using a pseudo-random source that is not cryptographically secure. Our result 
shows that this is risky, as using an over-simplified source for pseudo-randomness 
may give a malicious party additional control over the LowMC instantiation, and 
allow finding weak instances much faster than exhaustively searching for them 
in 2 122 time. 

Finally, we describe an attack that can break essentially all LowMC instances 
with 128-bit keys. Although the attack is significantly slower than the weak- 
instance attack, it is still about 1000 times faster than exhaustive search, and 
uses 2 73 chosen plaintexts. 

All of our results were obtained using the interpolation attack, which is an 
algebraic attack introduced by Jakobsen and Knudsen in 1997 [4]. In an inter- 
polation attack, the attacker considers some intermediate encryption value b as 
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Table 1 . Attacks on LowMC 


Instance 

Family 

Number of 
Rounds 

Section 

Rounds 

Attacked 

Fraction of 

Instances 

Data! 

Time!! 

Memory!!! 

LowMC-80 

11 

6.1 

9 

1 

2 35 

2 38 

2 35 



6.2 

10 

1 

2 39 

2 57 

2 39 



6.3 

all (11) 

2 — 38 

2 39 

2 57 

2 39 

LowMC-128 

12 

7.1 

11 

1 

2 7° 

2 86 

to 

-a 

0 



7.1 

all (12) 

2 — 122 

2 70 

2 86 

2 70 



7.2 

all (12) 

1 

2 73 

2 118 

2 so 


1 Given in chosen plaintexts, 
tt Given in LowMC encryptions. 
Ill Given in 256-bit words. 


a polynomial in the ciphertext bits. The aim of the attacker is to interpolate the 
algebraic normal form (ANF) of b by recovering its unknown coefficients, and 
this typically allows to recover the secret key using ad-hoc techniques. 

In order to recover the unknown coefficients, the attacker allocates a variable 
for each one of them. Assuming that b has a low-degree representation in terms of 
the plaintext bits, the attacker collects linear equations on the variables, typically 
by using high-order differentials in a chosen plaintext attack. After obtaining 
sufficiently many equations, the unknown variables are recovered by solving the 
resultant linear equation system. The efficiency of the attack depends on the 
algebraic degree of b in terms of the plaintext, but also on the number of allocated 
variables which is determined by the number of unknown coefficients in the ANF 
representation of b in terms of the ciphertext. 

Although our results were obtained using the well-known interpolation 
attack, its straightforward application does not seem to threaten the security 
of LowMC. Therefore, we had to develop new techniques such as using carefully 
chosen plaintext structures which allow to efficiently derive the linear system of 
equations. However, our main new contribution is described next by considering 
two variants of the interpolation attack. 

In the original variant of the interpolation attack over GF( 2) (which we refer 
to as variant 1), the attacker views the ANF of some intermediate encryption bit 
b as an initially unknown polynomial Fk{C) in the ciphertext bits C = ci, . . . , c n , 
where K = aq, . . . , x K is the unknown (fixed) secret key. In a dual approach to the 
interpolation attack, which we refer to as variant 2 (used, for example, in [8]), the 
attacker interpolates the full polynomial F(K, C ) by considering each monomial 
in the key bits aq, . . . ,x K with a non-zero coefficient as a separate (linearized) 
variable. For example, consider the polynomial 


F(ci,C2,Xi,X 2 ,X 3 ) = C1C2X1 +C1C2X2 +C1X1 + C1X2 +C2X1 +X1X2 +£3 + 1. 


F(x 1 ,x 2 ,x 3 )( c h c 2) — oqciC2 + oqci + <a 3 C 2 + oq, 


We can write 
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and thus in the first variant we have 4 variables: oq, <a 2 , o 3 , aq. In this variant, 
the actual representation of the variables in terms of the key is not considered. 
In the dual variant, we write 

F(ci,C 2 ,Xi,X 2 ,X 3 ) = XiX 2 (l) +Xi(ciC 2 +C1 + c 2 ) +x 2 (cic 2 +Ci) +2:3(1) + 1, 

and we have 4 variables: xi^ 2 , xi, x 2 , x 3 . 

The advantage of variant 2 over the first variant is that it directly recovers the 
secret key, and furthermore, in some cases it may result in a smaller number of 
variables in the equation system. At the same time, in order to derive the actual 
equation system the attacker has to evaluate the polynomial F for each cipher- 
text. This process is less efficient in variant 2, since each evaluation of F(K, C ) is 
expensive (it requires evaluating all the complex ciphertext expressions that are 
multiplied with the variables), whereas in variant 1 each evaluation of Fk{C) 
is relatively simple (it requires evaluating simple monomials in the ciphertext). 
Therefore, the choice of which variant to use in order to optimize the attack 
depends on the underlying cryptosystem. 

Our main idea is to combine the two dual variants of interpolations attacks: 
we first derive the equation system efficiently using the original variant of [4]. 
Then, we transform a carefully chosen variable subset to variables which are 
linearized monomials in the key bits, as in variant 2. This results in a mixed 
variable set that is smaller than the variable sets of each variant. Consequently, 
we obtain an attack which is more efficient than each one of the two variants. 

In our example above, we can express oq = X\ + x 2 , <a 2 = X\ + x 2 and 
as = xi, resulting in only 3 variables: xi,x 2 ,aq. Obviously, our toy example 
merely demonstrates the idea at a very high level, and the actual choice of 
which variables to transform as well as the analysis of the resultant algorithm 
are more involved. 

The paper is organized as follows. In Sect. 2 we give some preliminaries, while 
in Sect. 3 we give a brief description of LowMC. Our basic attack on 9-round 
LowMC with an 80-bit key is described in Sect. 4, while our generic framework 
for optimized interpolation attacks is described in Sect. 5. In Sects. 6 and 7 we 
apply our optimized attack to LowMC with 80 and 128-bit keys, respectively. 
Finally, we conclude the paper in Sect. 8. 

2 Preliminaries 

In this section, we describe preliminaries that are used in the rest of the paper. 


2.1 Boolean Algebra 

For a finite set 5, denote by |S| its size. Given a vector u = (iq, . . . , ?i n ) E 
GF( 2 n ), let wt(u) denote its Hamming weight. 

Any function F from GF(2 n ) to GF(2) can be described as a multivari- 
ate polynomial, whose algebraic normal form (ANF) is unique and given as 
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F(x i, . . . , x n ) = Y where a u G {0, 1} is the coefficient of 

u=(ui . ,u n ) EG F (2 n ) 
n 

the monomial M u = Yi and the sum is over GF( 2). The algebraic degree of 

i=i 

the function F is defined as deg(F) = max{wt(u) \a u ^ 0}. Therefore, a function 

d 

F with a degree bounded by d < n can be described using Y (™) coefficients. 

i = 0 
d 

To simplify our notations, we define (^) = Y (™) • 

— i = 0 

The ANF coefficient a u of F can be interpolated by summing (over GF( 2)) 
over 2 wt ^ evaluations of F: define the set of inputs S to contain all the 2 wt ^ 
n-bit vectors whose bits set to 1 is a subset of the bits set to 1 in rq, . . . , u n . 
More formally, let S = {x = (aq, . . . ,x n )\u A x = 0} (where u is bitwise NOT 
applied to u , and A is bitwise AND), then a u = Y F( x i? • • • , x n ). Note 

(x 1 ,...,x n )es 

that this implies that a function F with a degree bounded by d < n can be fully 
interpolated given its evaluations on the set of (< d ) inputs whose Hamming 
weight is at most d, namely {x = (xi, . . . , x n )\ wt{x) < d}. 

Given the truth table of an arbitrary function F (as a bit vector of 2 n entries) , 
the ANF of F can be represented as a bit vector of 2 n entries, corresponding 
to its 2 n coefficients a u . This ANF representation can be efficiently computed 
using the Moebius transform , which is an FFT-like algorithm. The Moebius 
transform performs n iterations on its input vector (the truth table of F ) , where 
in each iteration, half of the array entries are XORed into the other half. In total, 
its complexity is about n • 2 n bit operations. For more details on the Moebius 
transform, refer to [5]. 


2.2 High-Order Differential Cryptanalysis and Interpolation 
Attacks 

In this section, we give a brief summary of high-order differential cryptanalysis 
and interpolation attacks. 


High- Order Differential Cryptanalysis. High-order differential cryptanaly- 
sis was introduced in [7] as an algebraic attack that is particularly efficient 
against ciphers of low algebraic degree. The basic variant of high-order differen- 
tial cryptanalysis over GF( 2) considers some target bit b (which can be either a 
ciphertext or an intermediate encryption value) and analyzes its ANF represen- 
tation in terms of the plaintext P, denoted by Fk(P) (where K is the unknown 
secret key). Given that de^(P^(P)) < dg independently of K for dg (relatively) 
small, then the attacker chooses an arbitrary linear subspace S of dimension 
dg - 1-1, and evaluates the cipher (in a chosen plaintext attack) over its 2 dg+1 
inputs. Since every differentiation reduces the algebraic degree of the target bit 
by 1 and de^(P^(P)) < dg , the value of the high-order differential over S for 
the target bit b (namely, the sum of evaluations of b over GF( 2)) is equal to 
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zero (refer to [7] for details). High-order differential properties may be used in 
key recovery attacks, depending on the specification of the cipher (refer to [6]). 
However, such key recovery methods are not part of the framework described in 
this section. 


Interpolation Attacks. The interpolation attack was introduced in 1997 by 
Jakobsen and Knudsen as an algebraic attack on block ciphers [4]. The attack 
is closely related to high-order differential cryptanalysis 1 and (similarly to high- 
order differential cryptanalysis) is particularly efficient against block ciphers 
whose round function is of low algebraic degree. The interpolation attack has 
several variants, and can be applied over a general finite field, exploiting known 
or chosen plaintexts. Here, we give a high-level description of the chosen plaintext 
interpolation attack over GF( 2), as this is the variant we apply to LowMC. 

The attack considers some intermediate encryption target bit b of the block 
cipher, whose ANF representation can be expressed from the decryption side 
in terms of the ciphertext and key as F(C,K). The key K is viewed as 
an unknown constant, and thus we can write Fk{C ) = Fk(ci, . . . , c n ) = 
22 ol u M u , where a u £ {0, 1} is the coefficient of the monomial 

u=(ui ,...,u n )eG'F(2 n ) 
n 

M u = He?. Therefore, the coefficients a u of Fk(C) generally depend on the 

i=i 

secret key and are unknown in advance. The goal of the interpolation attack is 
to recover (interpolate) the unknown coefficients of Fk{C), and then use var- 
ious ad-hoc techniques (which are not part of the framework described in this 
section) in order to recover the actual secret key. 

In order to deduce the unknown coefficients of Fk(C ), they are considered 
as variables (i.e., linearized), and recovered by solving a linear equation system. 
For the purpose of constructing the equation system, the attacker assumes that 
the algebraic degree dg of the bit b in terms of the bits of the plaintext is 
relatively small, which allows to use high-order differential cryptanalysis (as 
described above). More specifically, a high-order differential property is devised 
by encrypting a subspace S of plaintexts of dimension dg + 1, and performing 
high-order differentiation with respect to this subspace, whose outcome is zero 
on the bit b. 

When expressed in terms of the ciphertexts Ci, . . . , C 2 d g +i (obtained by 

2 d+1 

encrypting the plaintexts of S'), this gives the equation 22 FiciCt) = 0- For 

t= l 

each ciphertext C*, F K (C t ) is merely a linear expression in the variables a u (the 
coefficient of a u in this expression is easily deduced by evaluating M u on C*), 
and thus the subspace S gives rise to one linear equation in the variables a u . 
In order to solve for the unknown variables a u , the attacker considers several 
such subspaces, each giving one equation. In total, the number of equations (and 


1 In fact, some of its variants directly exploit high-order differential properties, as we 
describe next. 
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subspaces considered) needs to be roughly equal to the number of the unknown 
a u variables, assuming the equations are sufficiently “random”. 

From the high-level description above, it is easy to conclude that the data 
and time complexities of the attack depend on the value of the degree dg and 
the number of unknown variables a u . Therefore, in order to mount efficient 
interpolation attacks, the attacker tries to minimize these parameters, as we 
demonstrate in our attacks on LowMC. 

2.3 Model of Computation 

Since an exhaustive key search attack (which evaluates the LowMC encryp- 
tion function) and our attacks use different bitwise operations, comparing these 
attacks cannot be done simply by counting the number of encryption function 
evaluations. Instead, we compare the complexity of straight-line implementa- 
tions of the algorithms, counting the number of bit operations (such as XOR, 
AND, OR) on pairs of bits. This computation model ignores operations such 
as moving a bit from one position to another (which only requires renaming 
variables in straight-line programs). As calculated in Sect. 3, the straight-line 
implementation of one encryption function evaluation of LowMC requires about 
2 19 bit operations. Consequently, a straight-line implementation of exhaustive 
search for 80-bit and 128-bit keys requires about 2" and 2 147 bit operations, 
respectively, and these are quantities of reference for our attacks. 

3 Description of LowMC 

LowMC is a collection of SP-network instances, proposed at Eurocrypt 2015 [1] 
by Albrecht et al. The specification defined two specific instance families which 
are analyzed in this paper, both having a block size of n = 256 bits, and are 
characterized by their key size ft, which is either 80 or 128 bits. In this paper, we 
refer to these instance families as LowMC-80 and LowMC- 128. The encryption 
function of LowMC applies a sequence of rounds to the plaintext, where each 
round contains a (bitwise) round- key addition layer, an Sbox layer, and an affine 
layer (over GF{ 2)). LowMC was designed with distinct features (as detailed 
in the pseudocode below): it has a linear key schedule and its affine layers are 
selected at random, where each selection defines a separate instance of the family. 
The Sbox layer of LowMC is composed of 3-bit Sboxes with degree 2 over GF( 2) 
(the actual specification of the Sboxes is irrelevant for our analysis and is omitted 
from this paper). Furthermore, the Sbox layers are only partial, namely, in each 
Sbox layer, only 3 m < n bits go through an Sbox (where m is a parameter), 
while the rest of the n — 3 m bits remain unchanged. 

Each family instance of LowMC is also defined with a data limit /ira, which 
determines the maximal (recommended) data complexity before changing the 
key. In other words, the cipher is guaranteed to offer security according to its key 
size as long as the adversary cannot obtain more than 2 hm plaintext-ciphertext 
pairs. The parameters of the two instance families are given in Table 2. 
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Table 2. LowMC instance families 


Instance Family 

key size k, 

Block Size n 

Sboxes m 

Data lim 

Rounds r 

LowMC-80 

80 

256 

49 

64 

11 

LowMC- 128 

128 

256 

63 

128 

12 


The pseudocode of the encryption function (taken from [1]) is given below. 

ciphertext = encrypt (plaintext , key) 

//initial whitening 

state = plaintext + MultiplyWithGF2Matrix(KMatrix(0) ,key) 
for (i = 1 to r) 

//m computations of 3-bit Sbox, n-3m bits remain the same 
state = Sboxlayer (state) 

//affine layer 

state = MultiplyWithGF2Matrix(LMatrix(i) , state) 
state = state + Constants (i) 

//generate round key and add to the state 
state = state + MultiplyWithGF2Matrix(KMatrix(i) , state) 
end 

ciphertext = state 


The matrices LMatrix(i ) are chosen at random from all invertible binary 
n x n matrices, while the matrices KMatrix(i ) are chosen independently and 
uniformly at random from all binary n x n matrices of rank min(n,n). The 
constants Constants (i) are chosen independently and uniformly at random from 
all binary vectors of length n. 

In this paper, we denote the 256-bit state at the input to the i’th key addition 
layer by X^% (e.g., the plaintext is denoted Xq), the input to the V th Sbox layer 
by Yi - 1 and the input to the i’th affine layer by Z*_i. We refer to the 3 m bits 
of the state that go through Sboxes in the Sbox layer as the S-part, while the 
remaining n — 3 m bits are referred to as the I-part. Given a state W, denote by 
W\SP and W\IP the S-part and I-parts of the state, respectively (e.g., Y$\IP 
is the I-part of the input state to the 6’th Sbox layer). 

It is common practice in cryptanalysis of block ciphers to exchange the order 
of the final two affine operations over GF( 2) (namely, the keyless affine transfor- 
mation and key addition). This allows the attacker to “peel off” the last affine 
transformation at a negligible cost by working with an equivalent last-round key 
(obtained by an affine transformation on the original last-round key). For the 
sake of simplicity, we assume in the following that we have already “peeled off” 
the last affine transformation of the cipher. Therefore, the final states of the last 
round r are denoted by X r _i, Y r _ j, Z r _ 1 and Y r , which denotes the ciphertext 
(after “peeling off” the final affine transformation). 

Each affine layer of LowMC involves multiplication of the 256 state with a 
256 x 256 matrix. This multiplication requires roughly 2 16 bit operations, and 
therefore a single encryption of LowMC (that contains more than 8 rounds) 
requires more than 2 16 • 8 = 2 19 bit operations (as already noted in Sect. 2.3). 
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4 A Basic 9-Round Attack on LowMC-80 

In this section we describe our basic interpolation attack on 9-round LowMC, 
which is given first without optimizations for the sake of clarity. We begin by 
considering the elements that are required for the attack. 

4.1 The High-Order Differential Property 

We construct the high-order differential property used in the interpolation 
attack. A similar property was described by the LowMC designers [1], but we 
reiterate it here for the sake of completeness. 

The algebraic degree of a single round of LowMC-80 over GF( 2) is 2, and 
therefore the algebraic degree of any bit at the input to the 6’th Sbox layer of 
LowMC-80, Y5, in the input bits, X 0 , is at most 32. Moreover, as the bits of the 
I-part of LowMC do not go through Sboxes in the first round, then the degree at 
the input to the 7’th Sbox layer, Yq, in the bits of the I-part, Xq\ IP, (given that 
the input bits of the S-part, Xq\ SP, are constant) is at most 32. Furthermore, 
since the bits of the I-part of the 7’th Sbox layer do not go through an Sbox, 
the degree of any bit of Zq\IP in the input bits of the I-part, Xq\ IP, is at most 
32 (given that X 0 \ SP is constant). 

The last property implies that the value of a 33-order differential over any 
33-dimensional subspace selected from Xq\ IP, (keeping Xq\ SP constant) is zero 
for any bit of Zq\IP. Moreover, as we selected a subspace whose bits do not 
go through an Sbox in the first round, the value of a 32-order differential for 
any bit of Z 6 \IP over any 32-dimensional subspace from Xq\ IP, is a constant 
(independent of the key). This observation implies that we can select several 
32-dimensional subspaces, and compute in a preprocessing phase the constants 
obtained by summing (over GF( 2)) over a target bit of Zq\IP (for an arbitrary 
fixed value of the key). Each such constant (derived from a 32-dimensional sub- 
space) gives one bit of information that we will exploit as the constant value of 
an equation in the interpolation attack. 

4.2 Bounding the Number of Variables 

In the interpolation attack on 9-round LowMC-80, we select a target bit from 
Zq\IP and denote its ANF representation in the 256-bit ciphertext (obtained 
after inverting the final affine transformation) and 80-bit key by F{C,K). We 
consider K as an unknown constant, and write Fk{C ) = Fk{c\ } . . . , C256) = 
Y <a u M u , where a u G {0, 1} is the coefficient of the monomial 

U=(U 1 ,...,U256)£GF(2 256 ) 

256 

m u = n c? . As the complexity of the attack depends on the number of variables 

2=1 

a u , it is important to estimate their number with good accuracy. An initial 
estimation can be made by observing that the algebraic degree of the (inverse) 
round of LowMC-80 is 2, 2 and thus deg(FK{C )) < 4. This implies that a u = 0 

The algebraic degree of any invertible 3-bit Sbox is (at most) 2. 
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in case wt(u) > 4, and therefore the number of unknown variables is upper 
bounded by (^) « 2 27 . 

The initial upper bound on the number of variables can be significantly 
improved by considering the specific round function of LowMC-80. For this pur- 
pose, it will be convenient to use additional notation to describe the variables a u 
according to the degree of M u , by defining the set of variables Ui for a positive 
integer i as Ui = {a u that is not identically zero as a function of the key| wt(u) = 
i f\u G GF( 2 256 )}. We have already seen that Ui is empty for i > 4 (as these 
variables are identically zero independently of the key) , and we now derive tighter 
bounds on \ Ui \ for i < 4. Thus, we analyze the symbolic representation of the state 
variables in the decryption direction, starting from the ciphertext Tg, up to T 6 , as 
polynomials in the ciphertext bits ci, . . . , C256- 

The ciphertext Tg contains 256 bits of ci, . . . , C256, while in order to compute 
Zg we merely add (unknown) constants to these bits (recall that we “peeled off” 
the last affine layer). Then, the inverse Sbox layer is applied to Zg to obtain 
the state T 8 . Each 3-bit Sbox may contribute (up to) 3 quadratic monomials 
to Tg, and 6 monomials in total, e.g., an Sbox corresponding to ciphertext bits 
ci,c 2 , C 3 may contribute the monomials ci, C2, C3, C1C2, C1C3, C2C3. Note that these 
monomials may appear in the ANF of different bits of Tg with different unknown 
coefficients (e.g., c\X\ and c\X 2 may appear in the ANF of two different bits of 
T 8 ). However, in interpolation attacks, we consider the ANF of the target bit, in 
which the coefficient a u of every monomial M u in the ciphertext is linearized and 
considered as a single variable. Therefore, the important quantity is the number 
of possibilities to create the monomials M u (for this reason, the monomial c\ 
is counted only once even if it appears in the ANF of different bits of Tg with 
different unknown coefficients). 

Since there are 49 Sboxes, the total number of monomials M u in the ANF of 
the state bits of Tg is bounded by 1 1/2 1 < 3 • 49 = 147, |Z7i| < 256 (which is the 
trivial bound) and \Ui\ ==&:0 for i > 3. As the affine and key addition mappings 
do not influence the number of monomials M u , this bound applies also to Xg 
and Zj. 

Next, the inverse Sbox layer is applied to Z 7 to obtain the state T 7 , for which 
we already know that \Ui\ = 0 for i > 4. Since the Sbox layer is of degree 
2, a trivial upper bound on the number of variables a u in T 7 is obtained by 

4 

multiplying the 147 + 256 = 403 monomials in unordered pairs, giving \{JUi\< 

i=l 

( 4 [] 3 ) +403 < 2 16,5 . Since the key addition and affine layers do not influence the 
number of monomials, the upper bound of 2 16,5 also applies to X 7 and T 6 , and 
it is much smaller than our initial bound of about 2 27 . 

4 

We denote the set of variables |J Ui by [/, and note that the explicit set 

{u \a u G U} (which gives the relevant monomials M u ) can be easily derived dur- 
ing preprocessing (which involves a more explicit computation of the monomial 
set {M u \a u G t/}, whose size is bounded above). 
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4.3 Obtaining the Data 

After deducing that the number of variables in the system of equations is \U\ ~ 
2 16 5 , we conclude that we need to differentiate over about 2 16,5 32-dimensional 
subspaces in order to obtain sufficiently many equations to solve the system. 
A trivial way to do this is to select about 2 16,5 arbitrary linearly independent 
32-dimensional subspaces from the 256 — 3*49 = 109 bits of Xq\ IP. This results 
in an attack with data complexity of 2 32+16,5 = 2 48 5 , and is rather wasteful. 
A more efficient approach (which was previously used in various papers such 
as [2]), is to select a large 37-dimensional subspace S from Xq\ IP, containing 
(32) > 2 18 linearly independent 32-dimensional subspaces, which should suffice 
for the attack (assuming that the constructed system of equations is sufficiently 
random). The subspaces are indexed according to 37 — 32 = 5 constant indexes 
that are set to zero in S. 


4.4 The Basic Interpolation Attack 

We now describe a basic interpolation attack on 9-round LowMC-80. We note 
that this attack is incomplete, as it only computes the \U\ variables a u using 
e « \U\ equations, without recovering the actual secret key. The details of this 
final step will be given in the optimized attack in Sect. 5.2. For the sake of con- 
venience, we describe the attack in two phases: the preprocessing phase (which 
is independent of the data and secret key) and online phase. However, we take 
into account both phases in the total complexity evaluation. 

Assume we selected a target bit b from Zq\IP, a subspace S of dimension 37 
from Xq\IP, and e « \U\ 32-dimensional subspaces Si, . . . , S e in S. The detailed 
attack is described below. 


Preprocessing: 

1. Compute an e-bit array of free coefficients for e ^ |J7| equations, denoted 
by a-o : evaluate b on the subset of inputs of S (with the key set to zero) , 
and obtain a bit array of size 2 37 . Finally, calculate the free coefficients 
by summing on b for the e 32-dimensional subspaces Si , . . . , S e in S, 
and store the result in a 0 . 

2. Calculate the \U\ vectors {u\a u G U}: This can be done by first calcu- 
lating the 403 monomials M u past the first Sbox layer, and multiplying 
them in pairs (as described in Sect. 4.2). 


Online: 

1. Ask for the encryptions of the 2 37 plaintexts in S and store the cipher- 
texts in a table. 
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2. Allocate a 2 37 x\U\ matrix A , where row A[t\ is a bit array that represents 

the evaluation Fx(Ct) (namely, a u M u (C t ))- 

{ u\a u eu } 

3. For each ciphertext C*, calculate A[t\ by evaluating FV(Ct): 

(a) For each {u\a u E F}, evaluate the monomial M u (C t ) (the coefficient 
of a u ) and set the corresponding bit entry in A[t\ according to the 
result. 

4. Allocate an e x \U\ matrix E over GF( 2), representing the equation 
system on U . 

5. For each 32-dimensional subspace Sj in S', namely St v ...,S e (that 
match the subspaces considered in preprocessing Step 1): 

(a) Populate the row (equation) E[j] by summing over the 2 32 rows of 
A corresponding to Sj. 

6. Solve the equation system Ex = ao, where x represents the vector 
of variables of V and ao is the vector of free coefficients calculated in 
preprocessing Step 1. 


The data complexity of the attack is 2 37 chosen plaintexts. The total time 
complexity of the attack is about 2 65 bit operations, dominated by online Step 
5 (for each of the e subspaces, we sum over 2 32 bit vectors of size \U\, requiring 
about e • 2 32 • \U\ « 2 65 bit operations). The memory complexity of the attack is 
about 2 37 • |C/| ~ 2 53,5 bits, dominated by the storage of the matrix A in online 
Step 2. 

We note that in the complexity evaluation of the attack we ignore indexing 
issues that arise (for example) in Step 3. a (that maps between a variable a u G V 
and its corresponding column index in A[t]), and in Step 5 (that maps between 
a subspace Sj in S and the corresponding 5 constant indexes of S). The reason 
that we can ignore these mappings in the complexity evaluation is that they are 
independent of the secret key and data, and therefore, they can be precomputed 
and integrated into the straight-line implementation of the program. 

5 The Optimized Interpolation Attack 

In this section, we introduce three optimizations of the basic 9-round attack 
above. The first optimization reorders the steps of the algorithm in order to 
reduce the memory complexity, while the second optimization further exploits 
the structure of chosen plaintexts to reduce the time complexity of the attack. 
Finally the third optimization is based on a novel technique in interpolation 
attacks, and allows to (further) reduce the data and time complexities. We first 
describe informally how to apply the optimizations to the basic 9-round attack 
on LowMC-80 above, and then devise a more formal and generic framework that 
can be applied to other LowMC variants. 
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The first two optimizations focus on online steps 2-5, which compute the 
equation system E from the 2 37 ciphertexts. First, we reduce the memory com- 
plexity by noticing that we do not need to allocate the matrix A. Instead, we 
work column- wise and focus on a single column A [*][£] at a time, corresponding 
to some {u\a u G U}. We evaluate M u (C t ) for all ciphertexts (which gives an 
array of 2 37 bits, a e) and then populate the corresponding column .E[*][/] by 
summing over the 32-dimensional subspaces Si, . . . , S e on a £. 

Next, we reduce the time complexity by optimizing the summation process: 
given a bit array ag of 2 37 entries, the goal is to sum over many 32-dimensional 
subspaces (indexed according to 5 bits which are set to zero). This can be done 
efficiently using the Moebius transform (refer to Sect. 2.1). For this purpose, 
we can view a £ as evaluating a 37-variable polynomial over GF( 2), and the 
summation over a 32-dimensional subspace of a £ is equal to the coefficient of its 
corresponding 32-degree monomial. All these coefficients are computed by the 
Moebius transform in about 37 • 2 37 bit operations. We stress that the reason 
that we can use the Moebius transform in this case is purely combinatorial and is 
due to the way that we selected the structure of subspaces for the interpolation 
attack. Indeed, there does not seem to be any obvious algebraic interpretation 
to a,£ when viewed as a polynomial. 

Finally, we optimize the data complexity (and further reduce the time com- 
plexity): In order to achieve this, examine the polynomial F(K , C ) (as a function 
of both the key and ciphertext) for the target bit b selected in Zq\IP. Due to the 
linear key schedule of LowMC, this polynomial is of degree 4, similarly to Fk{C ) 
(in which the key is treated as a constant). We consider a variable a u G U and 
analyze its ANF in terms of the 80 key bit variables. Since a u is multiplied with 
M u in F(K,C), then deg(a u ) + deg(M u ) < 4, implying that if deg(M u ) > 2, 
then deg(a u ) < 2. This simple observation is borrowed from cube attacks [2] 
and can be used to significantly reduce the number of variables U, as described 
next. 

Consider all the variables in U 2 U % U an d reca ll that their number was 
upper-bounded in Sect. 4.2 by roughly 2 16 5 . However, since all of these variables 
are polynomials of degree (at most) 2 in the 80 key bits, they reside in a linear 
subspace of monomials of dimension ( 8 2 °) + 80 = 3240. This implies that we can 
significantly reduce the total number of variables from « 2 16,5 to 3240 + 256 = 
3496 < 2 12 (including the 256 variables of U\) by considering linear relations 
between the variables U 2 (J % U U±. An immediate consequence of the reduction 
of variables is that we need less equations to solve the equation system, and 
therefore, we require less subspaces (or data) to obtain these equations. More 
specifically, a subspace of dimension 35 contains ( 33 ) = 6545 > 2 12 subspaces of 
dimension 32, which should suffice for the attack. 

Assuming that we interpolate the variables of U 2 (J ^3 U ^4 i n terms of the 
key and recover their values, then the key itself should be very easy to deduce, 
as the variables of U 3 are merely key bits. 

We note that while the idea above exploits the linear key schedule of LowMC, 
the technique is general and can be applied to block ciphers with arbitrary key 
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schedules. In this case, it would consider each round key as independent. This 
increases the number of variables in the (linearized) key, but not necessarily by a 
significant factor. For example, if LowMC-80 had a non-linear key schedule, the 
optimization above would interpolate U 2 U Us (J ^4 i n terms of ( 8 2 °) + 80 = 3240 
monomials in the key of round 9, and only 80 additional linear monomials and 
3 • 49 = 294 quadratic monomials in the key of round 8 that are created by 
the inverse Sbox layer of round 8 (we can assume that the key of round 8 is 
added right after the 8’th Sbox layer, as the key addition and affine layer are 
interchangeable) . 


5.1 Transformation of Variables 


In this section, we begin to describe our generic framework for interpolation 
attacks on LowMC by formalizing the last optimization described above. 

Given an instance of LowMC with a 256-bit block, a key size of ft, and m 
Sboxes per layer, we assume that we want to interpolate a target bit b through 
the final r\ rounds of the cipher. We first describe in a more generic way how 
to calculate the initial set of variables I/, and bound its size. As in the 9-round 
attack, the number of monomials in the 256 ciphertext bits at Y r -\ (after invert- 
ing the final Sbox layer) is bounded by 256 + 3m. The target bit b is a polynomial 
of degree 2 ri_1 in the state V r _i, and thus it contains at most mono- 

mials. Therefore, the set of monomials with (apriori) unknown coefficients can 
be computed by multiplying the 256 + 3 m monomials in unordered tuples (with 
no repetition) of size up to 2 ri_1 . Thus, 


\U\< 


f 256 + 3m\ 

V < 2^-4 y 


and this set can be computed with \U\ multiplications of tuples. Note again that 
this bound is generally better than the trivial bound of \U\ < (< 2 ^), which is 
obtained due to the fact that b is a polynomial of degree 2 ri in the 256 ciphertext 
bits. 

We consider the target bit b as a polynomial in both the ciphertext and the 
key, namely, F(K, C ) = F(.r u . . . , x K , c 1; . . . , c 25 e) = J2 a u M u , 

u=(u 1 ,...,u n )eGF(2 n ) 

n 

where M u = Yl C T and ol u {%i , . . . ,x K ) is a polynomial from GF( 2 K ) to GF( 2). 
i = 1 

We partition the variables of \U\ into subsets according to the degree of their 
monomials in the ciphertext, which is bounded by deg^Fx^C)) = 2 ri . Denote 

d 

d = 2 ri and write U = (J L^, where Ui = {a u G U\deg(M u ) = i}. Due to 

i = 1 

the linear key schedule of LowMC, we have deg(F(K,C )) = deg(FK{C)) = d, 
and therefore deg(a u ) + deg(M u ) < d. This allows us to transform the variable 
set U into a smaller variable set, considering internal linear relations due to the 
fact that deg(a u ) < d — deg(M u ). We stress again that the variable transforma- 
tion technique can be applied to block ciphers with arbitrary key schedules by 
considering each round key as independent. 
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We choose an integral splitting index 1 < sp < d+1 , and write U = U' U u", 

sp— 1 d 

where U' = (J Ui and U" = |J Ui. The observation above implies that 

i=l i=sp 

the algebraic degree of the variables in U" (in terms of the key) is bounded 
by d — sp, namely, deg(a u ) < d — sp, for each <a u G U" . Therefore, we 
can interpolate each variable of U" in terms of the key, and express it as 
a u = P U M V , where (3 V G {0,1} is the coefficient of the 

{v=(v\ ,...,v K ) \wt(v)<d— sp} 

monomial M v = Yi X T • Note that the coefficients (3 V are independent of the key 

and can be computed during preprocessing. This interpolation transforms the 
set of variables U" into the set of variables V, which are low degree monomials 

in the key bits V = {M v = x v { 1 2 \v = (fi, . . . , v K ) A wt(v ) < d — sp}. Similarly 

i=l 

to the partition of U, we partition the variables of V into subsets according to 

the degree of their monomials in the key, namely Vi = {M v G V\deg(M v ) = i}. 

i 

In addition, we define V<i = [j Vi. Note that a u G Ui is a linear combination 

3 = 1 

of variables in V<(d-i)* 

Recall that our initial set of variables is expressed as U — U'}jU", where 

sp— 1 d 

U' = |J Ui and U" = (J Ui. This set of variables is transformed via interpo- 

i—\ i=sp 

lation into a new set of variables W = U'{JV. 

We compute bounds on sizes of the variables sets as follows: 


\U’\< 


256 

< sp — 1 


\v\< 


< d — sp 


\w\ = \u’\ + \v\< 


256 

< sp — 1 


+ 



The Variable Transformation Algorithm. We now describe the algorithm 
which interpolates a variable a u G Ui in terms of the variable set V<(d-i)- For 
the sake of efficiency, the algorithm is performed in two phases, where in the 
first phase, we evaluate the polynomial a u in terms of the key for all relevant 
keys of low Hamming weight and store the results. Note that each evaluation of 
a u requires summing on 2 l evaluations of the target bit b. In the second phase, 
we use the evaluations to interpolate a u in terms of V<(d-q- 


1. Allocate a bit array a\ of size |V<^_q| for the evaluations of a u . 

2. Evaluate a u for each key with Hamming weight at most d — i. Namely, 
for each key in the set {K\wt(K) < d — i}: 
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(a) Evaluate F(K, C ) (the target bit) on the subset of 2 l inputs (with 
the fixed key K) {K, C\u A C = 0}, sum the result over GF( 2), and 
store it in a\. 

3. Allocate a bit array a 2 of size |V<(d-i)| for interpolation of a u in terms 

°f V<(d-i )• 

4. For each M v G V<(d-i) (with index £), the coefficient (3 V of M v in a u is 
calculated as follows: 

(a) Sum the 2 wt ^ values of a 1 calculated for the subset of keys {K\v A 
K — 0}, and store the result in 02 ^]. 


The total number of evaluations of b in Step 2 is 2 l • |V<(d-i)|> each requiring 
r\ • 2 16 bit operations. Therefore, the total complexity of this step is r\ • 2 16+z • 
\V<(d-i)\- Step 4 requires less than |V<^_q| • 2 d ~ l bit operations. In total, the 

interpolation of a u G Ui requires \V<^i) I ' ( r i ' 2 16+z + 2 d ~ l ) bit operations. 
d 

Since U" = (J L^, we can write the complexity of interpolating all the 

i=sp 

d 

variables as \Ui\ • |^<(d-.i)| * ( r i ' 2 16+z + 2 d ~ l ). A simple way to bound this 

i=sp 

complexity is 

\U"\ ■ \V\ ■ (ri • 2 16+d + 2 d ~ sp ) « |[/"| • VI • ri • 2 16+d . 

In some cases, we can obtain a refined bound by writing the complexity as 

d 

\U sp \-\V< {d _ sp) \-(r 1 -2 16+sp + 2 d - sp )+ J2 \U i \-\V< {d _ i) \-(r 1 -2 16+i + 2 d - i ) < 

i=sp+ 1 

\u sp | • \V< (d _ sp) I • (n • 2 16+sp + 2 d ~ sp ) + \U" I • V< (d _ ap _ 1) I • (n • 2 16+d + 2 d_sp+1 ) « 


V, P | • vi • (n • 2 16+sp + 2 d ~ sp ) + \u " I • V<(d- sp -i)l • n • 2 16+d . 

Note that the bound is potentially better than the trivial one of \U"\ • \ V\ -r\ • 
2 16 +d ag | U sp | < ( 2 s ^ 6 ), which may be smaller than \U"\. Moreover |V<^_ sp _ 1 )| < 

(<d-sp- 1 ), which is smaller than \V\. 


Transformation of Equations. After computing the transformation of vari- 
ables from U" to V, we need to apply the actual transformation to every equa- 
tion over U that we calculated. Namely, we are interested in transforming an 
equation over the variable set U = U'{JU", into an equation over variable 
set W = U'lJV. Obviously, the coefficients of the variables of U' remain the 
same, and we need to apply the transformation for every variable a u G U" . 
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The complexity of transforming a single variable a u G Ui in a single equation 
is simply equal to its number of coefficients over V , namely |V<(d-i)|- There- 
fore, the complexity of transforming all the variables a u G U" in an equation is 
d 

22 \Ui\ • |V<(d_i)|. A simple upper bound on this complexity is 

i=sp 


\U"\-\V\. 

Similarly to the variable transformation algorithm, a refined upper bound can 
be calculated as 

\Usp\ ■ \v\ + \U"\ • |V< (d _ S p_ 1) |. 

In total, if we transform e equations, the complexity calculations above are 
multiplied by e. 

Finally, we observe that the splitting index determines the complexity of 
the variable and equation transformation algorithms. Furthermore, the splitting 
index also determines \W\, which in turn determines the number of equations e. 
In general, we will choose sp in order to minimize |VF|, which in turn minimizes 
the data and time complexity of the attack. 


5.2 Details of the Optimized Interpolation Attack 

Given an instance of LowMC with a 256-bit block, a key size of ft, and m Sboxes 
per layer, we interpolate a target bit b through the final rq rounds of the cipher. 
Let J7, U', U" , V and W be as defined above, and let e ^ \W\ denote the number 
of equations. Assume S' is a sufficiently large subspace of plaintexts, such that 
it contains e smaller subspaces Si , . . . , S e whose high-order differential on b is a 
constant value (independent of the key). 

The preprocessing phase of the optimized attack in described below. 


Preprocessing: 

1. Compute an e-bit array of free coefficients for e « \U'\ equations, 
denoted by clq\ evaluate b on the subset of inputs (plaintexts) of S (with 
the key set to zero), and obtain a bit array of size |S|. Then, calculate 
the free coefficients by applying the Moebius transform to the bit array, 
and copy the values of sums over Si, . . . , S e to a o. 

2. Calculate the \U\ vectors {u\a u G U}: This is done by first calculating 
the 256 + 3 m monomials past the first Sbox layer, and multiplying them 
in unordered tuples (with no repetition) of size up to 2 ri-1 (as described 
in Sect. 5.1). 


Step 1 involves |S| evaluations of the encryption scheme and one application 
of the Moebius transform on a vector of size S. Altogether, it requires |S| • 
2 19 + log(|S|) • |S| « |S| • 2 19 bit operations (as log(|S|) 2 19 ). Step 2 requires 
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\U\ monomial multiplications, each monomial can be represented with a 256-bit 
array, and therefore this step requires 2 8 • \U\ bit operations. 

A summary of the complexity analysis of the preprocessing phase is as follows. 

Step Is 2 19 • |S| 

Step 2: 2 8 • \U\ 

In terms of memory, Step 1 requires |S| bits, while Step 2 requires 2 8 • \U\ 
bits. 


Online: 

1. Ask for the encryptions of the plaintexts in S and store the ciphertexts 
in a table. 

2. Allocate a bit vector of size |S| for the storage of the vectors a £ (the 
Eth column of the matrix A in the basic attack). 

3. Allocate an e x \W\ matrix E over GF( 2), representing the (reduced) 
equation system on W. The matrix is vertically decomposed into two 
smaller matrices: E\ of size ex \U'\ and E 2 of size e x VI- 

4. For each {M u \a u G [/} with an index l\ 

(a) For each ciphertext Ct, calculate ai[t] by evaluating M u (Ct). 

(b) Use the Moebius transform to sum over all subspaces of a ■£. 

(c) If a u G £/', populate column £ of E\\ For each subspace Sj in S t 
namely Si, . . . , S e , obtain its corresponding sum from a a and copy 
it to E x \j][£\. 

(d) Otherwise, a u G U": 

i. Given that a u G £/*, interpolate the coefficients of V<{d-i) i n a u 
as described in Sect. 5.1. 

ii. For each subspace Sj in S, obtain its corresponding boolean sum 
from a g (the coefficient of a u over U). If the sum is 1, then add 
(over GF( 2)) the interpolated coefficients into their indexes in 
E 2 [j] (as described in Sect. 5.1). 

5. Solve the equation system Ex = a 0 , where x represents the vector 
of variables of IT = U' [jV and ao is the vector of free coefficients 
calculated in preprocessing Step 1. 

6. Deduce the K-bit secret key, which is simply given by the monomials V± 
(namely, the monomials of degree 1 in V). 


The complexity of Step 1 is \S\ encryptions, or |5| • 2 19 bit operations. In 
Step 4, we iterate over \U\ monomials, where for each one we first evaluate 
M u (Ct) for each ciphertext in Step 4. a. Each such evaluation can be performed 
with d bit operations (as deg(M u ) < d), and thus monomial evaluations require 
about d • \S\ • \U\ bit operations. Next, we apply the Moebius transform in Step 
4.b, requiring about log(|S|) • \S\ bit operation, and therefore the complexity of 
all the transforms is about log(|5|) • l^l • \U\. The complexity of interpolating 
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all the variables in Step 4.d.i, is bounded in Sect. 5.1 by \U"\ • \V\ • r\ • 2 16+d . 
The complexity of Step 4.d.ii (over all a u G U") is bounded in Sect. 5.1 by 
e-\U”\-\V\n\W\-\U"\-\V\. 

The complexity of Step 5 is |W| 3 bit operations using Gaussian elimination. 
A summary of the complexity analysis of the online phase is as follows. Since we 
generally do not have a good bound for | U"\, we simply replace it with \U\ (as 
\U"\ < \U\), and further assume that e « \W\. 


Step 

1: (S'! -2 19 


Step 

2: \S\ 


Step 

3: \W\ ■ \W\ 


Step 

4. a: d-\S\- \U\ 


Step 

4.b: log (S' ) • S 

■\u\ 

Step 

4.c: \U’\ ■ \W\ 


Step 

4.d.i: \U\ ■ \V\ ■ n ■ 2 16+d 

Step 

4.d.ii: \W\ ■ \U\ • 

VI 

Step 

5: |W| 3 


Step 

6: negligible 



Alternatively, we can use the refined complexity bounds for steps 4.d.i and 
4.d.ii, as calculated in Sect. 5.1. 

Step 4.d.i: \U sp \ • \V\ • (n • 2 16 + s ? + 2 d ~^) + \U\ • |V (d _ sp _ 1) | • n • 2 16 + d 
Step 4.d.ii: \W\ • (|t/ sp | • \V\ + \U\ • V< (d _ ap _ 1) |) 

The total data complexity of the algorithm is |5| chosen plaintexts. The total 
time complexity is dominated by steps 4 and 5, as calculated above. The memory 
complexity is potentially dominated by a few steps: the storage of variables in 
preprocessing that requires 2 8 • \ U\ bits, the storage of ciphertexts in Step 1 that 
requires 2 8 • |5| bits, and the storage of E in Step 3 that requires \W\ • \ W\ bits. 

6 Optimized Interpolation Attacks on LowMC-80 

In this section we apply the optimized interpolation attack on LowMC-80, for 
which ft = 80 and m = 49. 


6.1 A 9-Round Attack 

As in the basic attack described in Sect. 4.4, we select the target bit b in Zq\IP, 
using subspaces of dimension 32 to obtain the equations. We interpolate through 
ri = 2 rounds, implying that d = 2 ri = 4. Therefore |J7| = = (^ 2 ) ~ 

2 16 - 5 . 

As described at the beginning of Sect. 5, we use sp = 2. We compute the 
size of the relevant variable sets \U'\ < (<^ 6 _ 1 ) = ~ 2 8 , \V\ < ( <d ^ sp ) = 

(|° 2 ) <2 12 , \W\ = \U'\ + \V\ <2 12 . 
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We choose a subspace S of dimension 35 from Xq\IP , containing ( 33 ) > 
2 12 > \W\ 32-dimensional subspaces, which should suffice for the attack. 

In terms of time complexity, the analysis of the critical steps of the attack is 
as follows: 

Step 4. a: d • \S\ • \U\ « 4 • 2 35 • 2 16 - 5 = 2 53 - 5 

Step 4.b: log(|5|) • |S| • \U\ « 35 • 2 35 • 2 16 - 5 = 2 56 - 5 

Step 4.c: \U'\ • \W\ « 2 8 • 2 12 = 2 20 

Step 4.d.i: \U\ • \V\ • rq • 2 16+d « 2 16 - 5 • 2 12 • 2 • 2 20 = 2 49 - 5 

Step 4.d.ii: \W\ • \U\ • |V| « 2 12 • 2 16 - 5 • 2 12 = 2 40 - 5 

Step 5: |W| 3 « 2 12 ’ 3 = 2 36 

In total, the time complexity of the optimized 9-round attack is about 2 57 bit 
operations (or 2 57-19 = 2 38 encryptions), mostly dominated by Step 4.b. The 
data complexity is 2 35 chosen plaintexts. The memory complexity is dominated 
by the storage of ciphertexts in Step 1, and is about |S| • 2 8 = 2 43 bits. 

We note that while the improvement of the optimized attack compared to 
the basic one is rather moderate for the 9-round attack, the effect of our opti- 
mizations is more pronounced in the attacks described next, as the reduction in 
the number of variables becomes more significant (a comparison for the attack 
on full LowMC-128 is at the end of Sect. 7.2). 


6.2 A 10-Round Attack 

Similarly to the 9-round attack, in order to attack 10 rounds of LowMC-80, we 
select the target bit b in Zq\IP, using subspaces of dimension 32 to obtain the 
equations. We interpolate through r i = 3 rounds, implying that d = 2 ri =8. 
Therefore \U\ = (<££??) = ( 4 < 03 ) < 2 30 - 5 . 

In this attack we use sp = 4, and compute the size of the relevant variable sets 
\U'\ < (<sp— i) = (<3 6 ) « 2 21 - 5 , |V| < (</_ J = (“) < 2 21 , \W\ = \U'\ + |F| < 
2 22 - 5 . We use the refined analysis for steps 4.d.i and 4.d.ii, and thus we also 
calculate \U sp \ = \U 4 \ = ( 2 f) < 2 27 - 5 and |F< (d _ sp _ 1) | = (g) < 2 16 - 5 . 

We choose a subspace S of dimension 39 from Xo\ IP, containing (g®) > 
2 23 > | IT 32-dimensional subspaces. 

In terms of time complexity, the analysis of the critical steps of the attack is 
as follows (using the refined analysis for steps 4.d.i and 4.d.ii): 

Step 4. a: d ■ |S| • |C7| « 8 • 2 39 • 2 30 - 5 = 2 72 ' 5 
Step 4.b: log(|5|) • |S| • \U\ « 39 • 2 39 • 2 30 - 5 « 2 75 
Step 4.c: \U'\ ■ \W\ « 2 21 - 5 • 2 22 - 5 = 2 44 

Step 4.d.i: \U sp \ ■ \V\ ■ (n • 2 16 + s p + 2 d ~^) + \U\ ■ |y< (d _ sp _ 1) | • n • 2 16 + d « 

2 27 -5 . 2 21 . ( 3 . 2 20 + 2 4 ) + 2 30 ' 5 • 2 16 ' 5 • 3 • 2 24 ss 2 70 + 2 72 - 5 w 2 73 

Step 4.d.ii: \W\-{\U sp \ Wl + I^l' V<(d- sp -i)|) « 2 22 ' 5 -(2 27 - 5 -2 21 + 2 30 - 5 -2 16 - 5 ) « 

222.5.(248.6 + 247)^271.5 

Step 5: |TT| 3 w 2 22 - 5 ' 3 = 2 67 - 5 
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In total, the time complexity of the optimized 10-round attack is about 2 76 
bit operations (or 2 57 encryptions), mostly dominated by Step 4.b. The data 
complexity is 2 39 chosen plaintexts. The memory complexity is dominated by 
the storage of ciphertexts in Step 1, and is about 2 8 • \S\ = 2 47 bits (note that 
the storage of E requires 2 22-5 ' 2 = 2 45 bits). 

6.3 An Attack on Full LowMC-80 for Weak Instances 

The 9 and 10-round attacks described above can be extended by an additional 
round with negligible cost for a subset of weak instances containing a fraction 
of about 2 -38 of all instances. In particular, this implies that about 2 -38 of the 
instances of full 11-round LowMC-80 can be attacked significantly faster than 
exhaustive search. 

Consider the 10-round attack: as shown above, we can construct an efficient 
high-order differential property for any choice of target bit of Zq\IP , and also 
for any linear combination of the bits of Zq\IP. When considering interpolation 
from the decryption side on a full 11-round instance, we can efficiently interpolate 
the polynomial Fr{C) for any bit of Z?\IP, or any linear combination of the 
bits of Zj\IP . Assume that there exists a linear dependency between the 109 
bits of Zq\IP and the 109 bits of Zj\IP. In this case, the linear combination 
in terms of Zq\IP does not go through an Sbox in round 8. Therefore, it is 
possible to extend the high-order differential property on this linear combination 
by another round with essentially no extra cost, and choose the target bit for 
interpolation to be the corresponding linear combination on the bits of Z?\IP. 
The existence of this linear dependency is determined by the affine layer of round 
7 (the transformation between Zq and X?), and assuming that random invertible 
matrices behave roughly the same (with respect to the event considered) as 
random matrices, the probability of this event is about 2 109+109_256 = 2 -38 
(over the choice of the 7’th affine layer). 

We note that there exists an additional subset of weak instances of about 
the same size since the described attacks can also be mounted using chosen 
ciphertexts (where interpolation is performed on the decrypted plaintexts). In 
this case, the weakness of a given instance is determined by the choice of the 
third affine layer. 

7 Optimized Interpolation Attacks on LowMC-128 

In this section we apply the optimized interpolation attack on LowMC-128, for 
which k = 128 and m = 63. 

7.1 An 11-Round Attack and Weak Instances of LowMC-128 

We describe our attack on 11-round LowMC-128 and then extend it to full 
LowMC-128 for weak instances. We select the target bit b in Zj\IP, and 
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interpolate through rq = 3 rounds, implying that d — 2 ri = 8. Therefore 

\U\ = (”?) = ( 4 < 4 4 5 ) < 2 31 . 

In this attack we use sp = 4, and compute the size of the relevant variable 
sets \U'\ < Ul-i) = (< 3 6 ) « 2 21 - 5 , \V\ < (</_ J = (™) « 2 23 - 3 , \W\ = 
\U'\ + \V\ ~ 2 24 . 

For the high-order differential property, we use subspaces of dimension 2 6 = 
64 whose bits are not multiplied together in the first round. The outcome of 
such a high-order differential is a constant (independent of the key) for 1 + 6 = 7 
rounds, and this property can be extended beyond the 8’th Sbox layer when 
selecting the target bit from Zj\IP. 

Since \W\ ~ 2 24 , we require roughly the same number of 64-dimensional 
subspaces to construct the equation system and mount the attack. Therefore, 
we take a larger subspace of dimension 70, containing Q > 2 24 « | IF | 64- 
dimensional subspaces. As Xq\ IP contains only 67 bits, we choose the subspace 
from these 67 bits and additional 3 bits in Xq\SP , contained in 1 active Sbox. 
Since the active Sbox is non-linear, we guess the 3 linear key expressions that are 
added to its input, which allow us to construct the required « 2 24 64-dimensional 
subspaces from a 70-dimensional subspace after the first Sbox layer. 

The guess of the 3 key bits can be avoided by selecting the 70 — 64 = 6 
constant bits of the 64-dimensional subspaces from the 67 bits of Xq\ IP in the 
70-dimensional subspace. This restriction keeps the selected Sbox fully active 
in all subspaces, and thus the linear subspace after the first Sbox layer (at Zq) 
is independent of the key bits. The number of such restricted 64-dimensional 
subspaces is ( 6 6 7 ) > 2 24 « |IF|, and hence they should suffice for the attack. 

Finally, we notice that the Moebius transforms (Step 4.b) can be optimized 
due to the way that we chose the subspaces in S', as for all of them, 3 specific bits 
of Xq\ SP are active. In order to exploit this, we perform the Moebius transform 
on a 2 70 bit vector in two phases: in the first phase, we partition the 2 70 big 
subspace into 2 67 3-dimensional subspaces according to the 67 bits of Xq\IP, 
and sum on all of them in time 2 70 , obtaining a vector of size 2 67 . In the second 
phase, we perform the Moebius transform on the 2 67 vectors computed in the 
first phase. Therefore, the complexity of a single Moebius transform is reduced 
from 70 • 2 70 « 2 76 to 2 70 + 67 • 2 67 « 2 73 . The complexity of online Step 4.b now 
becomes \U\ • 2 73 « 2 104 bit operations. 

The time complexity analysis of the critical steps of the attack is as follows: 

Step 4. a: d • \S\ • \U\ « 8 • 2 70 • 2 31 = 2 104 
Step 4.b: 2 104 (as noted above) 

Step 4.c: \U'\ • \W\ « 2 21 ' 5 • 2 24 = 2 45 - 5 
Step 4.d.i: \U\ ■ \V\ ■ n • 2 16+d « 2 31 • 2 23 5 • 3 • 2 24 « 2 80 ' 5 
Step 4.d.ii: \W\ ■ \U\ ■ \V\ w 2 24 • 2 31 • 2 23 - 5 = 2 78 - 5 
Step 5: |T^| 3 « 2 24 ' 3 = 2 72 

In total, the time complexity of the attack is about 2 105 bit operations, 
dominated by steps 4. a and 4.b. The data complexity is 2 70 chosen plaintexts. 
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The memory complexity is dominated by the storage of ciphertexts in Step 1, 
and is about |5| • 2 8 = 2 78 bits. 

Extending the Attack to Full LowMC- 128 for Weak Instances. Simi- 
larly to the attacks on LowMC-80, the 11-round attack on LowMC- 128 can be 
extended by an additional round with no increase in complexity for a subset 
of weak instances. However, the fraction of these instances is much smaller, as 
the I-part of LowMC-128 contains only 67 bits, and is smaller than the one of 
LowMC-80. A similar analysis to the one of Sect. 6.3 shows that the fraction of 
such weak instances for LowMC-128 is roughly 2 67+67-256 = 2 -122 . As noted in 
the Introduction, this attack does not violate the formal security claims of the 
LowMC designers. 

7.2 An Attack on Full LowMC-128 

We now describe our attack on full (12-round) LowMC-128. This attack is more 
marginal than the previous attacks, and we have to use essentially all of our 
previously described optimizations, as well as new ones in order to obtain an 
attack which is faster than exhaustive search. 

In order to attack 12 rounds of LowMC-128, we extend the interpolation of 
the 11-round attack past another round, interpolating Zj\IP through r i = 4 
Sbox layers, and hence d = 2 4 = 16, \U\ = (^i-™) = C<s) ~ 2 55 . 

In this attack we use sp = 8, and compute the size of the relevant variable 
sets \U'\ < (<“«,) = ( 2 < 5 7 6 ) « 2«-«, \V\ < (</_J = O « 2 4 °-«, \W\ = 
\U' | + \V\ w 2 44 . We use the refined analysis for steps 4.d.i and 4.d.ii, and thus 
we also calculate \U sp \ = \U$\ = ( 2 g 6 ) < 2 48 - 5 and \V<(d- sp -i) | = (<®) < 2 36,5 . 

The High-Order Differential Property. We can try to mount the attack 
with high-order differentials on subspaces of dimension 64 for the target bit in 
Z?\IP, but this results in an attack which is at best very marginally faster than 
exhaustive search. The main new optimization introduced in this attack is the 
use of reduced subspaces of dimension 60. Obviously, the result of a high-order 
differentiation over such a subspace is not a constant, but (as we show next) 
its algebraic degree in the key bits is bounded by 8. Consequently, the resul- 
tant function (polynomial) of each high-order differentiation can be expressed in 
terms of our reduced variable set V = |V<( 8 )|. This polynomial can be interpo- 
lated during preprocessing and does not contribute additional variables to the 
equation system. 

We select a big subspace S of dimension 73 that contains all the 67 bits of 
Xq\ IP and 6 additional bits of 2 active Sboxes in Xo\SP , and (similarly to the 11- 
round attack) define the 60-dimensional subspaces according to their 73—60 = 13 
constant bits in Xq\ IP. The number of such subspaces is (^) > 2 44 « \W\, and 
therefore they should suffice for the attack. 

In order to show that the result of a high-order differentiation of the target 
bit in Zj\IP over a selected 60-dimensional is of degree 8 in the key bits, consider 
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the state Z$ obtained after the first Sbox layer. The algebraic degree of the target 
bit b (selected from Zj\IP) in Z$ is bounded by 2 6 = 64. As the linear subspace 
undergoes a one-to-one transformation in the first Sbox layer (through the fully 
active 2 Sboxes), it remains a linear subspace in Zq. Therefore, the algebraic 
degree of the high-order differentiation in the bits of Z 0 and the key is upper- 
bounded by 64 — 60 = 4. Since each bit of Zq is a polynomial in the key of degree 
(at most) 2, the algebraic degree of the high-order differentiation in the bits of 
the key is upper-bounded by 4 • 2 = 8, as claimed. 


The Preprocessing Phase. The main change in this attack compared to the 
one of Sect. 5.2 is in preprocessing Step 1, where in addition to interpolating 
the e « \W\ free coefficients, we interpolate the e • \V\ « \W\ • \V\ coefficients 
of V (since we selected 60-dimensional subspaces instead of 64-dimensional sub- 
spaces). The modified preprocessing step is described below. It is similar to the 
variable transformation algorithm of Sect. 5.1, interpolating first over the plain- 
texts and then over the keys. Note that the matrix E of linear equations is 
allocated and initialized already at this stage. 


1. Allocate an e x \W\ matrix E over GF( 2), representing the (reduced) 
equation system on W. The matrix is vertically decomposed into two 
smaller matrices: E\ of size ex \U'\ and E 2 of size e x in 

2. Allocated an e • \V\ evaluation matrix EV . 

3. Allocate a \S\ = 2 73 bit array ai for the evaluations of the target bit b. 

4. For each key in the set { K\wt(K ) < 8} (with index £): 

(a) Evaluate b (the target bit) on the set S of 2 73 inputs (with the fixed 
key K ) and store the result in a\. 

(b) Apply the Moebius transform on a\. 

(c) Populate column i of EV: For each subspace Sj in S', namely 
Si,...,S e , obtain its corresponding sum from a\ and copy it to 

Ei[m- 

5. For each equation 1, . . . , e (with index j): 

(a) For each M v G V<$ = V (with index £): 

i Sum the 2 wt ^ values of EV[j] calculated for the subset of keys 
{K\v A K = 0}, and store the result in E 2 [j][£]. 


We first note that similarly to the 11-round attack, the complexity of the 
Moebius transform can be optimized (due to the way that we selected the sub- 
spaces) in a 2-step process from 73 • 2 73 to 2 73 + 67 • 2 67 « 2 74 . 

We analyze the complexity of the computationally heavy steps 4 and 5. The 
complexity of Step 4. a (for all {K\wt(K) < 8}) is \V\ • |S| • 2 19 « 2 40,5 • 2 73 • 2 19 = 
2132.5 complexity of Step 4.b (using the optimized Moebius transform) is 
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|F| • 2 74 « 2 114,5 . The complexity of Step 4.c is e-\V\& \W\ • \V\ « 2 44 • 2 40 - 5 = 
2 84,5 . The complexity of Step 5.a.i is bounded by e • \V\ • 2 8 « 2 44 • 2 40,5 • 2 8 = 
2 92 5 . In total, Step 4. a dominates the time complexity, which is about 2 132 - 5 bit 
operations. 


Analysis of the Full Attack. In terms of time complexity, the analysis of 
the critical steps of the online attack is as follows (using the optimized Moebius 
transform and the refined analysis for steps 4.d.i and 4.d.ii): 

Step 4. a: d-\S\- \U\ « 16 • 2 73 • 2 55 = 2 132 

Step 4.b: \U\ • 2 74 « 2 129 

Step 4.c: \U'\ ■ \W\ « 2 43 ' 5 • 2 44 = 2 87 ' 5 

Step 4.d.i: \U sp \ ■ \V\ • (n • 2 16+s p + 2 *-*) + \U\ ■ |y< (d _ sp _ 1) | • n • 2 16+d « 
248.5 . 2 40 - 5 . ( 4 . 2 24 + 2 8 ) + 2 55 • 2 36,5 • 4 • 2 32 ^ 2 115 + 2 125 - 5 « 2 125,5 
Step 4.d.ii: \W\-(\U ap \-\V\ + |^MF< (d _ sp _ 1) |) « 2 44 • (2 48 - 5 . 2 40 - 3 + 2 55 -2 36 - 5 ) « 
2 44 . (2 89 + 2 91,5 ) w 2 136 
Step 5: |IT| 3 « 2 44 ' 3 = 2 132 

The online phase complexity is about 2 136 dominated by 3 Step 4.d.ii. The 
total complexity of the attack is less than 2 137 bit operations, which is about 
2128+19-137 _ 2i° times faster than exhaustive search (including the preprocess- 
ing phase, whose complexity is about 2 132,5 ). The data complexity of the attack 
is 2 73 chosen plaintexts. The memory complexity is dominated by the storage of 
E, whose size is about \W\ • \W\ ~ 2 88 bits. 

Note that without the variable transformation, merely Step 5 (Gaussian elim- 
ination) would require about 2 55 ' 3 = 2 165 bit operations, which is much slower 
than exhaustive search. 4 

8 Conclusions 

In this paper, we introduced new techniques for interpolation attacks, including 
a new variable transformation algorithm that can lead to savings in their data 
and time complexities. We applied the optimized interpolation attack to LowMC, 
and refuted the claims of the designers regarding the security level of both the 80 
and 128-bit key variants. As a future work item, it will be interesting to optimize 
our techniques further and apply them to additional block ciphers. 


3 We note that the analysis of Step 4.d.ii can be refined further, and its actual com- 
plexity is lower by a factor between 2 and 4. Moreover, the actual algorithm of this 
step can be optimized, but we do not consider such low-level optimizations here for 
the sake of simplicity. 

4 Solving the equation system remains slower than exhaustive search even when using 
more advanced algorithms which are based on Strassen’s algorithm [9], requiring 
about 2 55 ' 2 ' 8 = 2 154 bit operations. While there are known algorithms that perform 
better in theory, most of them are very complex and inefficient in practice. 
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Abstract. Sprout is a new lightweight stream cipher with shorter inter- 
nal state proposed at FSE 2015, using key-dependent state updating in 
the keystream generation phase. Some analyses have been available on 
eprint so far. In this paper, we extend the design paradigm in general 
and study the security of Sprout-like ciphers in a unified framework. Our 
new penetration is to investigate the /c-normality of the augmented func- 
tion, a vectorial Boolean function derived from the primitive. Based on 
it, a dedicated time/memory /data tradeoff attack is developed for such 
designs. It is shown that Sprout can be broken in 2 79 ~ x ~ y time, given 
[c • (2x + 2 y — 58) • 2 71-:E-y ]-bit memory and 2 9+x+y -bit keystream, 
where x/y is the number of forward/backward steps and c is a small 
constant. Our attack is highly flexible and compares favorably to all 
the previous results. With carefully chosen parameters, the new attack 
is at least 2 20 times faster than Lallemand/Naya-Plasencia attack at 
Crypto 2015, Maitra et al. attack and Banik attack, 2 10 times faster 
than Esgin/Kara attack with much less memory. 


Keywords: Cryptanalysis • Stream ciphers • Sprout • Tradeoff 


1 Introduction 

Design of secure lightweight stream ciphers for constrained hardware environ- 
ments is important both in theory and practice. The most area/power consuming 
component in a lightweight design is the number of memory gates, which corre- 
sponds to the internal state size of the primitive. On the other hand, a common 
rule of thumb for stream cipher design is that the internal state size should be 
at least twice as long as the key size to resist against time /memory /data (TMD) 
tradeoff attacks [4]. 

This design principal indeed works, and security analysis of the eSTREAM 
finalists, e.g., Grain vl, Mickey v2 and Trivium [7] evolves rather slowly. At FSE 
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2015, another design paradigm for stream ciphers is proposed and instantiated 
by a new design, called Sprout, aiming to reduce the internal state size, thus 
the hardware area size by using key-dependent state updating in the keystream 
generation phase [2]. It is expected that the immunity against TMD tradeoff 
attacks will not be compromised. 

Surprisingly, there have been some cryptanalysis of Sprout appearing on the 
IACR eprint monthly after ESC 2015 and FSE 2015. In the time order of the 
open literature, a related key chosen IV attack on Sprout is presented in [9] , but 
the designers have already ruled out the related key model in [2] . Then the first 
attack in the single key model is found in [12] by using a list merging technique 
with a time complexity around 2 69 Sprout encryptions at Crypto 2015. In [13], 
another attack based on a SAT solver is given with a complexity of 2 54 attempts, 
where each attempt takes a time equivalent to 6.6 • 2 54 • 2 e encryptions which 
is more than 2 80 if e > 23. Thus, it is questionable whether this work in [13] 
translates into a feasible attack on Sprout or not. To directly challenging the 
design rationale, Esgin and Kara presented a TMD tradeoff attack in [8] with an 
online time complexity of 2 33 Sprout encryptions and 770 TB of memory after 
a pre-computation around 2 53 basic operations. Finally in [3], a key recovery 
attack is launched against Sprout with a complexity of 2 66 7 Sprout encryptions 
together with some other analysis results. 

In this paper, we extend the design paradigm in general and study the secu- 
rity of Sprout-like ciphers in a unified framework. The model involves the secret 
key not only in the initialization process but also in the non-linear state updating 
in a Sprout-like manner during the keystream generation phase. Then based on 
the notion of normality first introduced by Dobbertin in [6], we investigate the 
/c-normality of the augmented function [5] , a vectorial Boolean function derived 
from the underlying primitive. This property is relevant for the design and analy- 
sis of cryptosystems. In [14] and [15], security implications of ^-normal Boolean 
functions are considered when they are employed in certain stream ciphers. We 
make a systematic security analysis based on this property for Sprout-like stream 
ciphers and develop a dedicated TMD tradeoff attack framework for such designs. 
In particular, it is shown that Sprout can be broken in 2 79 ~ x ~ y time, given 
[c • (2x + 2y — 58) • 2 71_:E_?/ ]-bit memory and 2 9+:E+?/ -bit keystream, where x is 
the number of forward steps, y is the number of backward steps and c is a small 
constant. Our attack is highly flexible and compares favorably to all the previous 
attacks on Sprout. With carefully chosen attack parameters, our method is at 
least 2 20 times faster than Lallemand/Naya-Plasencia attack at Crypto 2015, 
Maitra et al. attack and Banik attack, 2 10 times faster than Esgin/Kara attack 
with much less memory. Practical simulations confirmed our analysis. 

This paper is structured as follows. In Sect. 2, the stream cipher Sprout is 
described and generalized to a generic Sprout-like model. In Sect. 3, based on 
a natural extension of normality from Boolean functions to vectorial Boolean 
functions, a generic TMD cryptanalysis framework of such ciphers is formalized 
with complexity analysis. In Sect. 4, the framework is applied to Sprout with 
comparisons to other attacks. Section 5 provides the experimental results. Finally, 
some conclusions are given in Sect. 6. 
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2 Sprout-Like Stream Ciphers 

In this section, a brief description of Sprout that is relevant to our work and 
a generic Sprout-like model that inherits the design spirit are presented. The 
following notations will be used throughout the paper. 

- L l = [l t , k+u •••, ^+ 39 ], the internal state of the LFSR at time t. 

- N f = [n t ,n t + 15 ...,nt+ 39 ], the internal state of the NFSR at time t. 

- [a, b] = {a, a + 1, ..., 5}, for two positive integers a, b (a < b). 

~ ^[a,6] ^t+a+l 1 •••5 and {^£+ 0,5 ^t+a +1 5 •*•5 k)r two 

positive integers a, b (a < 5). 

- ZK = (ir’tpR’i, ...,^ 69 ), the 70-bit initialization vector. 

- IT = (fco, fci, Z 79 ), the 80-bit secret key. 

- , the round key bit generated at time t. 

- zt, the keystream bit generated at time t. 

- c| , the round constant at time £, generated by a counter. 

2.1 Description of Sprout 

Sprout adopts a structure similar to the Grain family of stream ciphers [1,10, 11], 
which consists of four parts, an 80-bit fixed key register, a 40-bit NFSR with a 
linked 40-bit LFSR, and a counter register, depicted in Fig. 1. Since storing a 
fixed key requires less area size than realizing a register of the same length, it is 
reported in [2] that the hardware area of Sprout is significantly less compared 
to the existing lightweight stream ciphers. 


K 


Round key function 


* 0 - 


NLFSR 


LFSR 


<> 



Fig. 1 . Keystream generation of sprout 


Denote the feedback functions of the NFSR, the LFSR and the nonlinear filter 
function by g, f and h respectively. There is a 9-bit counter register in Sprout, of 
which the lower 7 bits are a modulo 80 counter, denoted by (c® , cf , , cf , cf , c \ , c£) 


564 B. Zhang and X. Gong 


at time t. The 4-th LSB cf of the counter is employed in the keystream genera- 
tion. It should be noted that, c\ has a cycle of length 80, i.e. , in each cycle, this 


bit takes the values 0, 0, ..., 0 1, 1, ..., 1 0, 0, ..., 0 1, 1, ..., 1 0, 0, ..., 0. 


16 16 16 16 16 

The 40-bit LFSR is updated recursively by / as lt +40 = It 0 k +5 0 h +15 0 
lt +20 0 ^t +25 0 ^+ 34 - The NFSR is updated recursively by a non-linear feedback 
function g as 


?R+ 40 — 0 h 0 c t 0 diN*) 

= 0 It 0 Ct 0 Til 0 13 0 rit + 19 0 ^£+35 0 n t+39 

0 ™t+2™t+25 0 7R+37R+5 0 ™t+7™t+8 0 ™t+14™t+21 0 ™t+16?0+18 
0 ?0+22?0+24 0 ?0+26?0+32 0 ^£+33^+36^+37^+38 
0 7R+10?R+11?R+12 0 ^+27^+30^+31- 

Let Ut = lt+4 0 ^t +21 0 ^t+37 0 n t + 9 0 ^t+20 0 n t+ 29? then 

r = f fet, o < t < 79 
t \ ^(mod so) * u t , otherwise. 

Given the internal state at time t , the keystream bit is generated as 
Zt = h(nt+ 4, b+6, lt+ 8, L+10, b+32, b+17, L+19, L+23, Ut+38) 0 Zt+30 0 (©, G4 nt+ ')’ 


where A = {1, 6, 15, 17, 23, 28, 34}, and the filter function is 

h(') = ^t+4^+6 0 ^+8^+10 0 ^£+32^+17 0 ii+19^+23 0 ?0+4^+32?0+38- 

During the key/IV setup phase, since the key is fixed, first load the IV in the 
following way: rii = ivi, 0 < i < 39; U = R+ + 4 q, 0 < i < 29 and ^ = 1, 30 < i < 
38, Z 39 = 0. Then run the cipher 320 rounds as follows. 

- the LFSR update function is changed to ^+40 -^0/ (T t ). 

- the NFSR update function is changed to n t + 4 o = z t 0 0 l t 0 0 g(N t ). 

- no keystream bit is generated. 

After the initialization phase, the keystream generation phase starts and there 
is no feedback keystream anymore. 


2.2 A Model for Sprout-Like Stream Ciphers 

There are three functions involved in the model: a non-linear function G(x), 
a linear function F(x) and a non-linear filter function h(-). 

The internal state of the model consists of the non-linear state N and the 
linear state L. At each step, the function G(-) is applied to N and F(-) to 
L, respectively. Besides, there may also be some other mixing procedure that 
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xoring some bits of N into L, and vice versa. Further, the secret key is involved 
in the non-linear state updating selectively by a function u(-). The output of the 
current state is also computed as the xoring of the bits from both N and L and 
a non-linear filter function ft(-), which takes some input values from both N and 
L, respectively. Some notations that will be used in the description are listed 
here. 

- L* = [Lq, L \ 7 ..., L* J, the internal state of the linear component. 

- AT* — [Nq , AT-f , ..., Nf 2 _ 1 ] J the internal state of the non-linear component. 

- rL l — {L* t , Z/^ 2 , ..., }, a subset of L l and the linear part of u(-). 

- rAT* = {AT^ , N l 5<2 , ..., AT^}, a subset of AT* and the non-linear part of u(-). 

- pL l = , L^ 2 , ..., }, a subset of L l with the variables of the filter 

function h(-) coming from the LFSR. 

- pAT* = {AT^, Np 2 , ..., Np n }, a subset of AT* with the variables of the filter 
function h(-) coming from the NFSR. 

- qL l = {L t (Tl ' i L t (T2 ' i ...' i L t (T }, a subset of L l and the linear masking in the 
keystream generation function. 

- qN l = {AT^ , AT* 2 , ..., N* }, a subset of AT* and the non-linear masking in the 
keystream generation function. 

- pqN f = pN * U qN l , the variables used in the keystream generation coming 
from the NFSR. 

The general framework is specified by the following items (we only focus on 
the keystream generation phase). 

1. Components 

- The linear component is 1} = [Lq, L \, ..., L\ x _i[ G F ^ 1 , whose initial state 
is denoted by L°. It is updated recursively as L* +1 = F(L*). Without loss 
of generality, we assume this process is invertible, and the inverse process is 
L*- 1 = F\L l ). 

- The non-linear component is AT* = [ATg, AT-f , ..., Nf 2 _ 1 \ G i 7 ^ 2 , whose initial 
state is denoted by AT 0 . It is updated recursively as 

AT* +1 = G(AT* © Li(L t )) © L 2 (L*) © u(rL\ rN f ) • R(t, K) © C t , 

where G(-) is a (^^-vectorial Boolean function, C t is a counter related 
vector of length l 2 . Note that whether the key is involved in the state updat- 
ing is dependent on the value of u(-). If ^(rL* ,rN f ) = 1, the key will be 
involved. Similarly, we assume this non-linear process is invertible, and the 
inverse process is computed as 

TV*- 1 = G / (AT*©L / 1 (L*- 1 ))©L / 2 (L*- 1 )©^(rL*- 1 ,rAT*- 1 ). J R(t-l,iV)©G t _i. 

- A filter function h(-) from F™ 1+ri2 into F 2 is used as part of the output function 

in the form /i(pL*,pAT*), which takes n\ input values {L^, L^ 2 , ..., L ^ } from 
L l and n 2 input values { AT^ , , ..., AT|^ } from AT*, respectively. 
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- A linear Boolean function /(•) from i^ 1+m2 into F 2 is used as part of the out- 
put function in the form l(qL f , qN l ), which takes mi input values {L t ai , L* 2 , 

} from L l and m 2 input values {N^ , N( 2 , N* } from N f , respectively. 

- An output function </>(•) = /(•) ® h(-), which generates the keystream {z t }t> 0 
based on the inputs taken from both L l and N b , t = 0, 1, ... 

2. Keystream Generation 

The keystream {z t }t> 0 is recursively generated as 

z t = h^pL 1 ,pN l ) 0 l(qL\ qN *), t = 0, 1, ... 

Let U be the subspace of Ff 1 and denote the dimension as dim{U ), define 
U := {a G F™ : a ^ [/} U {0} as the complementary space of U. Now a coset 
of the subspace U is represented by U a := a 0 U, a G U, also called a flat. The 
following definitions are needed in our model. 

Definition 1. An m-variable Boolean function f is k-normal (resp. k-weakly 
normal) if there exists a flat V C Ff 1 of dimension k such that f is constant 
(resp. affine) on V. 

For example, the 5-variable Boolean function hf) in Grain-vl is 2-normal 
and 3- weakly normal, and the 9- variable Boolean function h(-) in Sprout and 
Grain- 128a is 5-normal. 

Next, we study a natural generalization of the above definition for vectorial 
Boolean functions [5]. 

Definition 2. An (m, n) -function F: Ff 1 — > Ff is called k-normal if there 
exists a flat V C Ff 1 of dimension k such that F is constant on V. 

In our analysis, we investigate the ^-normality of the augmented function defined 
as follows. 

Definition 3. For a {n\ 0 77 - 2 ) -variable Boolean function h^pIf^pN 1 ), the 
(b + f + l)-th augmented function of h , F[^ b J) : j?m 1 +m 2 pb+f+i ^ g e fi nec [ as 

H^iPLfPN*) = (h(pL t - b ,pN t - b ),...,h(pL\pN t ),...,h(pL t+ flpN t+f )), 

where 6, f are two positive integers, and 

PL' = U L- b pL t+ \ Mi = \PL t \ < f \pL t+i \ =ni{b + f + 1), 

i= — b 

PN* = u L-bP Nt+ b M 2 = |PJV*| < £ \pN t+i \ = n 2 {b + f + 1). 

i=-b 


3. Assumptions 

-3.1: there exists two positive integers 5,/ such that [j{ = _ b pqN t+l C N l 
for any t > b. In this case, the output segment Zt-b, •••, z t , ..., z t +f can be 
computed from the complete state (L*, N l ) at time t. 
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- 3.2: the (b + / + l)-th augmented function of the filter function 5, is 

a ^-normal Boolean function such that H^ b ^\x i, ...,x n ) = 0 6+ ^ +1 when Xj is 
fixed for all j G 12, where Q is a subset of [1, n\ and \Q\ = n — k. 

- 3.3: there exists two positive integers d, e such that [Ji = _ d rN t+l C N f for any 
t > d. In this case, u(rL tJrl , r7V t+z ), i = — d, ..., —1, 0, 1 , e can be computed 
from the complete state (Z5, N f ) at time t. 

- 3.4 : assume pqN^f^ 1 (jL N f and pqN t +^+ 1 c 7V t+1 for any t > 5, meaning 
that we cannot get pgA/' t+ ^ +1 from the state (L t ,A/' t ). Note that the secret 
key is incorporated in the non-linear state updating selectively, if we assume a 
special state (Z5, N f ) such that uiprL 1 , riV*} = 0, N t+1 can be computed from 
(Z5, A/' t ), thus we further get the output bit z t + /+i . Repeat this process for x 
steps, i.e. , we assume a special state (L t ,A7' t ) such that ^(rL t+z , r7V t+2 ) = 0 
for 2 = 0, 1, ...,£ — 1, then we get the output bits ^+/+ 1 , ..., zt+/+a;. 

- 5.5: assume r 7V t+e+1 ^ TV* and r7V t+e+1 C AT+ 1 for any t > d. For the above 

special state such that u(rL t+l , rN t+l ) = 0 for i = 0,1,..., x — 1, 

if x — 1 < e, we have only unknowns from (L^TV*); if x — 1 > e, then the 
unknowns from 7V t+1 , 7V t+2 ,... will appear with some nonlinear equations 
N t+ -7 +1 = G( 7 V t+J ‘ © Fi(L t+ ^)) © L 2 (L t+J ) ® G t+j , j = 0, 1, ...,x - e - 2. 

- 3.6: assume pqN t ~ h ~ 1 <f_ and pgAT -6-1 C N t_1 for any t > 6, which 

means we cannot get pqN t ~ b ~ 1 from the state (15 , TV*). If we assume a special 
state (L t ,A7' t ) such that rA7' t_1 ) = 0, N t_1 can be computed from 

(L^TV*), thus we further get the output bit z t -b- 1 - Repeat this process for y 
steps, i.e., we assume a special state (15, TV*) such that ^(rTV* - - 7 , rL* - - 7 ) = 0 
for j = 1, ...,i/, then we get the output bits zt-b- 1 , ...,Zt-b-y 

- 3.7: assume r7V t_d_1 ^ TV* and rTV* _d_1 C TV* -1 for any t > d. For the 
above special state (. L *,TV*) such that ^(rL* - - 7 , rTV* - - 7 ) = 0 for j = 1, ..., 2 /, if 
y < d, we have only unknowns from (Z5,TV*); if y > d, then the unknowns 
from TV* -1 , TV* -2 ,... will appear with some nonlinear equations TV* -J_1 = 
G'(TV* - 2 © L' (L*-*- 1 )) © L'^-i- 1 ) © G t _,-_i, j = 0, 1, ..., y-d-1. 

It is easy to check that the proposed model includes a number of primitives, 
e.g., Sprout and the Grain family. For Grain family, the term uiprL 1 ,rN f ) = 0 
for any time t. For Sprout, TV* = [n t ,n t +. 1 , ...,n t+39 ], L* = [l t ,k+ 1 , — dt+ 39 ], and 
for any t, u(rL\ rTV*) = Z t+4 © Z t + 2 i © Zt +37 0 ™t+9 0 ^+20 0 ^+29 • The positive 
integers 6, /, d, e are 5=1, / = 1, d = 9, e = 10 respectively. 

3 A TMD Tradeoff Attack Framework 

In this section, we provide a systematic security analysis for Sprout-like stream 
ciphers. A dedicated TMD tradeoff attack framework is developed for such 
designs based on the ^-normality of the augmented function. 

The goal of cryptanalysis is to recover the internal state which has generated 
a sample segment, and if possible, given the internal state, to further restore the 
secret key. There are two phases in the framework: the pre-processing phase and 
the processing phase. The offline pre-processing phase is performed only once 
and is independent of the employed secret key and the keystream sample. 
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3.1 Pre-Processing Phase 

In the offline pre-processing phase, some tables are prepared which will be used 
later in the processing phase. Given the parameters Zi, Z 2 and 5, /, d, e, x, y , define 
a two-dimensional counter array C = \Ct- y . .... CV_i, Ct, .... C t +( x -\)\, we 
construct the State-Keystream pair tables as follows. 

1. Under the assumptions in the model, construct a system of equations which 
implies a “special” state (L*, AT*) satisfying the following conditions. 

- (1.1) H^iPL^PN 1 ) = 0 6+/+1 and l(qL t+i , qN t+i ) = 0, for i = -5,..., 
-1,0,1,...,/. 

- (1.2) u{rL t+t , rN t+l ) = 0 for i = 0, 1, ..., x — 1, from which we can get the 
output bits z t+ f+i,...,zt+f+x- 

- (1.3) u{rL l ~\r AT* -J ) = 0 for j = 1, ..., y , from which we can get the output 
bits zt-h-iy.^zt-b-y. 

2. Suppose Assumptions 3.2, 3.5 and 3.7 hold, 

- if x — 1 < e and y < d, the above system of equations has only unknowns 
from the state (I/, TV*). 

- if x — 1 > e and y < d, the unknowns from AT* +1 , AT*+ 2 ,... will appear with 
some non-linear equations: 

N t+j+1 ;= G(AT* + -* © Li(L* +j )) © L 2 (L t+j ) © C t+j , j = 0, 1, ..., x - e - 2. 

Define another counter array C' = [ C t , C*+ 1 , C t+ ( a ,_ e _ 2 )], note that the 
round constant vectors in C' are involved in these equations. 

- if x — 1 < e and y > d, the unknowns from AT* -1 , AT* -2 ,... will appear with 
some nonlinear equations: 

AT* -J ’ -1 = G / (A* - ' 7 ’©L / 1 (L* -j-1 ))©L2(L* -J ’ -1 )©Gt-j-i, j = 0, 1, ..., y-d- 1. 

Define counter array C' = [CV-^-d), •••, Ct- 2 , Ct— 1 ] , the round constant 
vectors in C' are involved in these equations. 

- if x — 1 > e and y > d, the unknowns from AT* +1 , AT* +2 ,... and AT* -1 , 
AT* -2 ,... will appear with some nonlinear equations: 

AT t+J ' +l = G(iV th N* © Li(L t+ ©) © L 2 (L t+j ) © C t+i , j = 0, 1, a; - e - 2, 

N t ~ j ~ 1 = G'(N l - j © L 1 ,(L t - J - 1 )) © L 2 ,(L t - J - 1 ) © = 0, 1, y-d-1. 

Define counter array C' = \C t _^ y _^, ..., Ct-i, C t , C t + 1 , •••, Ct+(®-e- 2 )], the 
round constant vectors in C' are involved in these equations. 

3. For each possible counter array C', solve the constructed system of equations 
and get the special states (L*, AT*) satisfying 1 and 2. Memorize the special 
state (L*, AT*) in the first column of a row in table T^,, further for this state 
and for each possible counter array C* = C\C', get the corresponding (x + y) 
output bits z t -b- 1 , ..., Zt-b-y, zt+f+ij ...,zt+f+ x and store them in the second 

y 

column as a sub-row in table T^, . 


X 



Another Tradeoff Attack on Sprout-Like Stream Ciphers 569 


Remarks. Denote the number of rows (in the first column) of table T^y as 2 r , 
if r < x + y, we only need to store (x + y — r) output bits in the second column, 
indexed by r-bit of the output. Next, let Z^ + ^ +1 ^ = [z t -b, •••, Zt+f\ C 
F 2 6+/+1 , then an internal state satisfying the condition (1.1) implies Z^ + ^ +1 ) = 
q6+/+ i. p ur ther, for each counter array C', N t+1 ,...,N t+x and N t_1 ,...,N t ~ y can 
be computed directly from a “special” state (L t , N f ) according to the non-linear 
state updating function without involving the secret key. 


3.2 Processing Phase 

Now we discuss how to recover the internal state which has generated a sample 
segment, and if possible, given the internal state, to further restore the secret 
key. The following two propositions have provided us a direct way of key recovery 
from an internal state candidate and some keystream bits. 

Proposition 1. For a special state (Z^, AT) satisfying the conditions (1.1) and 

(1.2) , N t+1 , . . . ,N t+x can be computed directly from the complete state {L 1 , AT) 
and the non-linear state updating function without involving the secret key. 
Besides, if u(rL t+x ,rN t+x ) = 1, we may get some secret key information 
R(t + x,K) when the keystream bit z t +f+ x +i is known. Further, more key infor- 
mation R(t-\-x+j , K), j = 0, 1, ... will probably be obtained when more keystream 
bits z t +f+ x +j+ 1 , j = 0, 1, ... are known. 

Proof. The first half is clear from the condition (1.2). 

For a special state (IT, AT), if u(rL t+x ,rN t+x ) = 1, the secret key information 
R(t + x , K ) is incorporated into the updating of the non-linear part from N t+X 
to N t+X+1 . One can check that the keystream bit z t +f + x + 1 is dependent on 
jyt+x+i' j n a WO rd, R(t + x,K) is likely to affect (if u{rL t+x ,rN t+x ) = 1) 
the keystream bit zt+f+ x +i- Accordingly, we may obtain some key information 
R(t + x , K ) from zt+f+ x +i- This procedure can be repeated many times. □ 

Similar to the proof of Proposition 1, we have the following proposition. 

Proposition 2. For a special state (L b , N*) satisfying the conditions (1.1) and 

(1.3) , can be computed directly from the complete state (L*, N l ) 

and the non-linear state updating function without involving the secret key. 
Besides, if u(rL t ~ y ~ 1 , rN t ~ y ~ 1 ) = 1, we may get some key information R(t — y — 
1, K) when the keystream bit z t -b- y - i is known. Further, more key information 
R(t — y — j,K), j = 1,2, ... will probably be obtained when more keystream bits 
Zt-b-y-j , j = 1,2, ... are known. 

By utilizing the pre-computed tables and the given keystream sample, the 
processing phase is carried out as follows. 

The Internal State Recovery Algorithm. Given the parameters b , /, x , y , 

the tables T^,, and the keystream sample {z t }t> o, the processing steps are as 
follows. 
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1. Search the keystream sequence {z t }t for the next non-considered block of 
(& + / + 1) zeros. If there are no more blocks, output a flag that the algorithm 
has failed. 

2. For each detected block, compute the corresponding counter array C, C' and 
C* from the time £, compare the x-bit segment of the keystream subsequent 
to the block and y- bit segment prior to the block with the memorized (x + y)- 
bit segments in the second column (sub-row is indexed by C*) of the table 
Tq,, and do the following: 

- If the matching does not exist, go the processing Step 1. 

- If the (x + y)- bit output segment matches with a segment in table T^,, go 
to Step 3. 

3. Read the corresponding state, and if appropriate, recover (part of) the secret 
key according to Propositions 1 and 2 from this state and more keystream 
bits. 


3.3 Complexity Analysis 

In the Sprout-like model, the keystream bit is generated as z t = h^pL 1 ,pN l ) ® 
l(qL f ,qN f ). For the (6,/) derived from Assumption 3.3, we define a flat V 
of dimension dim(V^ b: ^) such that = 0 6+ -f +1 over it. i.e., 

V {bJ) = : H^XPL^PN 1 ) = 0 b+/+1 }, 


We have the following lemma which is closely related to the time complexity of 
processing (table look-ups) of our proposed algorithm. 

Lemma 1. Suppose Pr[u { •) = 0] = p, assume all the events in (1.1), (1.2) and 
(1.3) are independent, then the probability that an internal state (. L t ,N t ) is a 
special state satisfying the conditions (1.1), (1.2) and (1.3) simultaneously when 
the keystream segment = q 6 +/+ 1 i s 


Pr 


(. L t ,N t ) is a special state I Z^ 6+ ^ +1 ^ = o 6+ -f +1 


1 


2h+Z2-dim(V( b >/)) 


■P 


p+y 


where i s a j\ a i suc y ppipj) _ q6+/+i over ^ 


Proof. For any internal state [L t ,N t ) and keystream segment 
underlying assumptions directly imply the following: 

Pr [{L\ N l ) is a special state] = 2 , i+ , a _ d ] m(v(t , J)) • ^+T ' P 


<b+f+l) 


, the 


t x+y 


and Pr 


Z (b+/+1) = Qb+f+l] = 2 ~(b+f+ l) j and 


Pr 


2 ^+ 7 +!) _ q 6+/+1 


(if, N f ) is a special state 


= 1. 
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On the other hand, 


Pr 


(L*, N f ) is a special state | Z^ + ^ +1 ^ = o 6+ ^ +1 

Pr [(L 1 ,N t ) is a special statef Pr [ z^ b+ ^ +1 ' ) =0 b+ ^ +1 1 (L 1 ,N t ) is a special statej 
Pr[z^ +/+1) =O b +/+ 1 ] 

1 . jjX+y 

2 n+^2-dim(v( b >/)) r 


which yields the statement of the lemma. □ 

Theorem 1. Suppose ft is a flat such that H^ b, ft = 0 6 +7 +1 over it, then the 
complexities of the proposed generic algorithm for cryptanalysis are as follows: 

(1) The processing data complexity is D = 2 /l+Z2-dim ( y(b,/) ) -2 6+ *f +1 -p~( x + y ). 

(2) The expected space complexity in the pre-processing phase is proportional 
to the sum of number of rows in each table T q,. 

(3) The processing (table look-ups) time complexity is proportional to 

2h+^2-dim(y (b ’ /) ) ' p-(x+y) ' 

(4) The pre-processing time complexity is equivalent to the workload for solv- 
ing the system of equations constructed. 


Proof. The data complexity is determined by the probability that an internal 
state is a special state satisfying conditions (1.1), (1.2) and (1.3) simultane- 
ously in the pre-processing phase, which is given in the proof of Lemma 1 as 

2-(h+h-dim(V( b ’ f '>)) . 2 -(b+f+l) . px+y _ Thus wg haye jj _ 2 Ji+/2-dim(y< t, "«) . 

2&+/+1 . p~(x+y)' 

For each possible counter array C', we have constructed the corresponding 
table Tq/, thus the estimated space complexity is proportional to the sum of 
number of rows in each table Tq, . 

In the processing phase, the expected number of table look-ups depends on 
the probability that an internal state (L*, N f ) is a special state satisfying the 
conditions (1.1), (1.2) and (1.3) simultaneously when the keystream segment 
Zf +/+1) = 0 6+ ^ +1 , which is given in Lemma 1 as 2 _ ^ 1+/2_ ^ m ^ y(b,/) ^ • p x+y . 
Thus the number of table look-ups is 2 Zl+Z2-dzm ( y(b,/) ) • p~( x + y ) m 

The pre-processing time complexity is determined by the workload for solving 
the system of equations constructed. □ 


4 Cryptanalysis of Sprout 

In this section, we apply the framework proposed in Sect. 3 to Sprout with the 
comparisons to the previous relevant attacks. 


4.1 Fitting into the Model 

Sprout fits into the model with the parameters l\ = I 2 = 40, which are the length 
of LFSR and NFSR respectively. The keystream bit zt at time t is generated as 

Z t = h(nt+ 4 , ^+ 6 , h+ 85 h+ 10 ? ^£+32, lt+ 17, h+ 19, ^t+23, ^t+38) 

®^£+30 ® T'H+l ® ^t+6 ® ^£+15 ® ^£+17 ® ^£+23 ® ^£+28 ® ^£+34? 
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where h(-) = rq+4^+6 0 ^t+8^+10 0 h+ 32^+17 0 h+ 19^+23 0 7p+4^+32?P+38- 
As described in Sect. 2, whether the secret key is involved in the NFSR state 
updating is determined by the value u t = /t+4 0^+210^+370^+90^+200^+29? 
to fit in the model, we have rlS = {l t+ 4 , k+ 21 , k+ 37 }, rN f = {n t+9 , n t+20 , n t+29 } 
and the two parameters d = 9, e = 10 such that |j|£_ g rN t+l C N*. 

Let pL* = {lt + 6,l t +8,h+io,lt+n,lt+i9,k+23,h+32}, pN f = {nt+4,nt+3s}, 
QL 1 = {+30}, and qN* = {n t+1 , n t+6 , n t+15 , n t+V7 , n t+23 , n t+28 , n t+34 }. From 
this we have b = 1, / = 1 to fit into the Sprout-like model. Given (6, /) = (1, 1), 

ff( 1 ’i)(PL t ,PA+ = (h(pL t - 1 ,pN t - 1 ),h(pL t ,pN t ),h(pL t+ 1 ,pN t+1 )) , 

where PL * = F[ 5i11 ] Ul/j 16 20 ] U -^[ 22 , 24 ] ^-^[ 31 , 33 ] an< ^ = + 3 , 5 ] ^ + 37 , 39 ] > thus 

+M)(.) 

is a (24, 3)-vectorial Boolean function. 

Suppose nt+3 = n t + 4 = n t + 5 = 0, by a computer computation, there 
are 12096(> 2 13 ) possible values for the following 16-bit of LFSR such that 
i)(.) = 0 3 : 

P t = [^+7^+8^+9^+lo| l^t+11^+16^+17^+18 

||^+19^+20^+22^t+23 1 Kt+24^+31^+32^+33] C L l . 

For example, P t = 0x0000, 0x8000, 0x4000, OxcOOO, ... when denoted by 

hexadecimal digits. We denote all the 12096(> 2 13 ) values of P t as ai, < 22 , a3, 
a4,...,ai2096 such that a\ = 0x0000, a 2 = 0x8000, a3 = 0x4000, <24 = OxcOOO,... 
respectively. 

For Sprout, we will use 2 13 flats defined as follows: 

Vi = {(+,+) : P l = CLi and n t+j = 0 ,j = 3,4,5}, i = 1, ...,2 13 

Note that in each V}, 19 bits of (L*, N f ) are fixed, then each V, has a dimension 
of dimiVi) = 61 , and + 1 ’i)(.) = 0 3 over Vi. Further we define a flat V as 
2 13 

V = Ui=i Vi- Thus the dimension of V is dim(V) = 74. 

4.2 Cryptanalysis 

We first discuss how to construct tables that will be used in the processing phase. 

Pre-processing Phase. Given the parameters x, y and do the following: 

1. Define a counter array as C = [c^_ y , •••, c \ , cf +1 , c^ + ^_ 1 )] of size |C| = 

x + y. For an internal state (L*, TV*) such that nt+3 = n*+4 = n*+5 = 0 (thus 
there are 77 unknowns), construct a system of equations which implies a state 
(L t , N l ) satisfying the following conditions. 

- (a). l(qL t+i , qN t+i ) = 0, for i = -1, 0, 1. 

- ( b ). iq+z = 0 for i = 0,1,..., x — 1, from which we can get the output 
bits zt+ 2 ,~",Zt+x+i (suppose the round constants cf, c^, ..., c^ x _^ are 
known) . 
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- (c). u t -j = 0 for j = 1 from which we can get the output bits 
z t - 2 T--,Zt- y -i (suppose the round constants c\_ ± , c$_ 2 , c\_ y are 
known) . 

2 . We discuss it in the following situations: 

- Case 1 : If x < 11 and y < 9, we have the corresponding system of (3 + 
x + y) linear equations with only 77 unknowns from the state (L*, N 1 ): 40 
unknowns from L l and 37 unknowns from N l . 

( h + 30+k 0 (0j gj 4 nt+i + k) = 0, k = — 1, 0, 1 

\ h+4+i © h+21+i 0 lt+37+i 0 n t +9 +i 0 nt+ 20 +i 0 nt + 29 +i = 0, i = 0, x — 1 
[ h+4-j 0 h+ 21 -j 0 h+37-j 0 nt+9-j 0 nt+ 20 -j 0 nt+29-j = 0 , j = 1, ...,y 

- Case 2: If x > 12 and y < 9, in addition to the 77 unknowns from 

the state (L t ,A7' t ), the unknowns n t + 40 , ^t+4i,"-, n t+40+Oc-i2) will appear 
with some non-linear equations. Thus we obtain a system of equations 
with (66 + x) unknowns, and (2x + y — 8 ) equations ((3 + x + y) linear 
equations and (x — 11) non-linear equations). Define another counter array 
C' — [c^ , ..., c t+(rr_ 12 )] °f s i ze 1^1 = x ~ n °t e that the round 

constants in C' are involved in this system. 

( h+30+k 0 (0j gj 4 n t+i+k) = 0, k = —1, 0, 1 

h+4+i 0 h+21+i 0 h+37+i 0 n t +9+i 0 ™i + 20 + i 0 7l t + 29 + i = 0, 1 = 0, 1$ X — 1 
h+4-j 0 h+ 2 i— $ 0 Zt+37-a 0 nt+9-j 0 ^t+ 20 -j 0 ^t+ 29 -j l,...,y 

n t +40+m 0 h+m 0 c^ +m ® g(N t+rn ) = 0, m = 0, 1, x — 12 (non — linear ) 

- Case 3: If x < 11 and y > 10, in addition to the 77 unknowns from the 
state (L*, A7' t ), the unknowns n t -i, n t - 2 ,...,rq_q / _ 9 ) will appear with some 
non-linear equations. Thus we obtain a system of equations with (68 + y) 
unknowns, and (x+2?/ — 6 ) equations ((3 +x+t/) linear equations and (y — 9) 
non-linear equations). Define C' = [c^_^_ 9 j, ...,c|_ 1 ] of size \C'\ = y — 9, 
the round constants in C’ are involved in this system. 

( h+30+k 0 (®^ e A nt+i+k ) = 0, k = —1, 0, 1 

h+4-H 0 L + 21+* 0 h+37+i 0 nt+9+i 0 7l t +20+i 0 7l t +29+i = 0, 1 = 0, 1, X — 1 
lt+4 — j 0 h + 21 —j 0 lt+37— j 0 nt+9—j 0 ^i + 20 — j 0 ^t + 29-jf = 0, j = l,...,y 
n t -n 0 h-n 0 c t-n © </ (7V t_n+1 ) = 0, n = 1, y — 9 ( non — linear ) 

- Case 4: If x > 12 and y > 10, in addition to the 77 unknowns from 
the state (L t ,A7' t ), the unknowns rq+ 40 , ^t+4iv, n t+40+(;E-i2) and 
nt_ 2 ,...,n t _( |/ _ 9 ) will appear with some non-linear equations. Thus we 
obtain a system of equations with {57+x+y) unknowns, and {2x + 2y — 17) 
equations ((3+ x+y) linear equations and {x+y — 20 ) non-linear equations). 

Define C' = [r , 1 (y i f/. (> , 2j ] of size |C"| = z+y- 20 , 

the round constants in C’ are involved in the system. 

i lf + 30+k (^^£=^4 ^f + i + fc) = 0, k — 1, 0, 1 

h+4+i 0 lt+21+i 0 lt+37+i 0 n t +9 + i 0 7l t + 20+i 0 rit + 29+i = 0, i = 0, 1, X — 1 
lt+4 — j 0 h+ 2 \-j 0 lt+37— j 0 Tlt +9 — j 0 nt + 20 — j 0 Tlt + 29-j = 0, j = 
n t +4o+m 0 lt+m 0 c t+m © 9 (N t+rn ) = 0, ?7i = 0, 1, x — 12 (non — linear) 
n t -n 0 lt-n 0 c t-n © g' (N t ~ n+1 ) = 0, n = 1, y — 9 (non — linear) 

3. For each possible counter array C", solve the constructed system of equations. 
Observe that all the round constants in C’ are added to the system linearly, 
by guessing at most 2 7A ~^ x+y ^> appropriate unknowns we can solve the system 
and get 2 7A ~^ x+y ^> solutions (L*, N l ) for each possible counter array C' . 
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4. For each possible counter array C", check each of the 2 7A ~^ x+y ^> solutions 
(L^AT). If (L^AT) G Vi, i.e., P l = a* for any i = 1,2, ...,2 13 , store the 
61-bit (L**, AT**) in the first column of a row in table T c f ,i, where L * 1 = L t \P t 
and N* f = A/' t \{nt+3, n t + 4, nt+5}. Further for this state and for each possible 
round constants of C* = C\C f , get the corresponding (x + y) output bits 
(zt-y-i , ..., Zt- 2 , ^t+2, • ••, ^t+x+i) and put them in the second column as a 
sub-row in table T c',i- Thus there are expected 2 58 ~ x ~ y rows in the first 
column and 2 58 ~ x ~ y x count(\C'\) rows i n the second column, where Count(n ) 
represents the number of all the possible counter arrays of size n. 

We list in Table 1 the number of all the possible counter arrays Count (n) of 
size n. 

Table 1. The size of the counter array n and the number Count{n) for all the possible 
counter arrays 


n 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

Count (n) 

12 

14 

16 

18 

20 

22 

24 

26 

28 

30 

32 

33 

n 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

Count (n) 

35 

37 

39 

41 

43 

45 

47 

49 

51 

53 

55 

57 

n 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

Count (n) 

59 

61 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

n 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

Count (n) 

73 

74 

75 

76 

77 

78 

79 

80 

80 

80 

80 

80 


Remarks. First, it can be seen that, the necessary and sufficient condition for a 
state (ZZ, AT) to be a “special” state is that (L*, AT) g V and the conditions (a), 
( b ) and (c) hold. Second, for each possible counter array C, nt+40v> n t+40+(®-i) 
and can be computed directly from a special state (ZZ, AT) accord- 

ing to the state updating of NFSR without involving the key information. Third, 
denote the number of rows (in the first column) of table T c',i as 2 r y if < x + y, 
we only need to store (x + y — ri) output bits in the second column, indexed 
by r^-bit of the output. Finally, in the pre-processing phase, we have obtained 
Count(\C' |) x 2 13 tables T c\u each having 2 58 ~ x ~ y rows in the first column to 
store “special” states and 2 58 ~ x ~ y x count(\C'\) rows secon d column to 

store the corresponding output bits. 

Lemma 2. The probability that an internal state (L^iV*) is a special state 
( such that (L^iV*) G V and the conditions (&), (b) and (c) hold) when the 
keystream segment z[ 3 ^ = 0 3 is given by the following: 


Pr 


(Z/, N f ) is a special state\ z[ 3 ^ = 0 3 


2 -(6+x+y) 


Proof. For any internal state (Z/, N f ) and keystream segment z[ 3 ^ 
lying assumptions directly imply the following: 

Pr [(U, N*) is a special state] = 240+40 i dim(v) x = 2~ 


, the under- 

■( 9+x+y ) 

5 
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and Pr 


zf } = o 3 


= 2 


-3 


and 


Pr 


On the other hand, 


zf } - 


= (T 


(L*, N*) is a special state 


= 1. 


Pr (L*, N f ) is a special state | = 0 3 

Pr^L* jiV*) is a special state] xPr Z^ 3 -' =0 3 | (L* , A/ - * ) is a special statej 

Pr[z£ 3) =0 3 ] 

— 2-(S+ x +y) ' 

which yields the statement of the lemma. 


□ 


> n t + 19 0 n t + 28 and 


Next, we will present a State Checking and Key Recovery Mechanism specified 
for Sprout, by which we have the opportunity to check whether a state candidate 
is correct, and if so, further recover the key for a correct guess. 

State Checking and Key Recovery Mechanism. For a state candidate at 
time t, L* = [l t , l t+l , Z t+39 ], N * l 2 = [n t , n t +i, n t+39 ], create an 80-bit vector 
K for the possible values associated with it: 

1. Compute the value of n t -i given by the key stream bit z t -2 as n t ~i = z t -2 0 

h(n t+ 2 , k+ 4, h+6, lt+ 8: ^t+30: ^t+15: ^+17, ^t +21 > ^t+36) 0 ^t+28 0 (© ieA' n t+i- 2 ) 
where A' = {6,15,17,23,28,34}. And compute l t ~% by the LFSR updating 
equation as / t -i = it+39 0 it+33 0 it+24 0 ^+19 0 it+14 0 ^t+4, and deduce 
from n t - 1 , Zt_i the value k^_ 1 by the NFSR updating equation as k^_ x = 

n t+39 0 c tq ® ^-1 0 1 )- 

2. Compute the value of u t ~ 1 = k + 3 0 /t +20 0 ^+36 0 ^t+8 < 
combine it with the value of k^_ x obtained in Step 1: 

- if u t ~ 1 = 0 and k^_ x = 0, set t — > £ — 1 and go back to Step 1. 

- if rq_i = 0 and = 1, there is a contradiction, conclude that this guess 

for state is not correct and stop. 

- if u t - 1 = 1 and k^_ x = 0, check if &q_i) mod 80 has already been set in K. 
If no, set it to 0. Set t — > t — 1 and go back to Step 1. Else, if there is a 
contradiction, conclude that this guess for state is not correct and stop. 

- if u t ~ 1 = 1 and fc*_ x = 1, check if mod 80 has already been set in K. 

If no, set it to 1. Set t — > t — 1 and go back to Step 1. Else, if there is a 

contradiction, conclude that this guess for state is not correct and stop. 

Similar to the statements in [8], the probability that a state candidate survives 
for 2 r clocks is 2 -r . On average for each 2 clocks, half of the possible guesses will 
be eliminated. For 2 s candidate states, the average number of clocks for each 
elimination is 


E 

i = 0 


2 x 2 s - 
2 s 


= Y — 
02—1 


2=0 


We can conclude that 4 clocks of output is enough for checking the validity of a 
candidate state and the recovery of the key bits for each candidate. 
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Next we illustrate the algorithm for the internal state and key recovery in 
the processing phase. 

Processing Phase. Given the parameter x, y , the corresponding Count(\C'\) x 
2 13 tables T c',i and the given keystream sample {z t }t> o, the processing steps 
are as follows: 

1. Search the keystream sequence {z t }t for the next non-considered block of 3 
zeros. If there are no more blocks, output a flag that the algorithm has failed 
to recover the key. 

2. For each detected block, compute the corresponding counter array (7, C' and 
(7* from the time t. For i = 1,...,2 13 , compare the x-bit segment of the 
keystream subsequent to the block and y- bit segment prior to the block with 
the memorized (x + y)- bit segments in the second column (sub-row is indexed 
by (7*) of the table T <7/ ^, and do the following: 

- If the matching does not exist, go to the processing Step 1. 

- If the (x + y)- bit sample segments match with a segment in table T c',i, go 
to Step 3. 

3. Read the corresponding state, check whether it is a correct state or not and 
recover the secret key by the State Checking and Key Recovery Mechanism 
stated above. If this state survives, recover and output the key, else go to 
Step 1. 

Theorem 2. For two positive integers x,y, the dedicated TMD tradeoff on 
Sprout has complexities as follows: (1) The data complexity for the processing is 
D = 2 9+x+y ; (2) The expected memory M (-bit) of pre-processing is computed as 
follows: 


M 


( CountQC'l) x 2 71 ~ x ~ y x 
\ Count(\C'\) x 2' 


71 — x—y 


, Count(x+y ) 
U1 Count(\C' |) 


61 + 


Count(x-\-y) 
Count(\C '\ ) 


'(x + y ) ] , if x + y < 30, 

■ ( 2x + 2 y — 58) J , if x + y > 30. 


(3) The time complexity of processing is 2 70 - 66 ~ x ~ y Sprout encryptions along 
with 2 Q+x+y table look-ups. (4) The time complexity of pre-processing is propor- 
tional to 2 7A ~ x ~ y . 


Proof. The data complexity is determined by the probability that an internal 
state (17, AT) is a special state (such that (17, A7) e V and the conditions (a), 
( b ) and (c) hold), which is given in the proof of Lemma 2 as 2~ ( ^ JrX+y \ Thus 
we have D = 2 9 + x + y . 

As for the memory, we need Count(\C'\) x 2 13 tables T c',i, each having 
2 5 8 -x-y rowg - n co i umn to store 61-bit “special” states (3+16=19-bit 

are fixed for each table) and 2 58 ~ x ~ y x count(\C'\) rows i n second column 
to store the corresponding output bits. If x + y < 30, each row in the second 
column contains (x + y) output bits; if x + y > 30, each row in the second column 
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contains 2 (x + y) — 58 output bits, indexed by 58 — x — y bits of the output. 
Hence the memory M(-bit) is computed as follows: 


M = 


( Count(\C'\) x 2 13 x 2 58 x y x 61 + • ( x + y)\ » if x + y < 30, 

| Count(\C'\) x 2 13 x 2 58 ~ x ~ y x 61 + + 2y — 58) j , otherwise. 


In the processing phase, the expected number of table look-ups is determined 
by the probability that an internal state is a special state when the 

keystream segment = 0, which is given in Lemma 2 as 2~( 6 + x+y \ Thus 
the number of table look-ups is 2 Q+x+y . For each (x + y)- bit keystream bits, 
we have 2 71 ~ 2 ( x+y ^ state candidates producing the output. As stated before, 4 
more clocks of output is enough for checking the validity of the state and the 
recovery of the key bits for each candidate. In total, the time complexity is 
2 6+x+ y x 2 71 ~ 2 ( x+ y') x 4 = 2 79 ~ x -y, which is equivalent to = 2 70m ~ x ~y 

Sprout encryptions. 

The pre-processing time complexity is equivalent to solving the constructed 
system of equations. We see that by guessing at most 2 7A ~^ x+y ^ appropriate 
unknowns we can solve the system for each possible counter array C' . As the 
counter values are added to the systems linearly, we can do the Gauss elimination 
only once to store separate tables for each of the Count(\C'\) counter arrays. □ 


4.3 Detailed Workload for x = 16, y = 15 

We now focus on the workload to solve the system of equations for x = 16, 

y = 15. For a state (L^iV*), let n t +j = 0 ,j = 3,4,5 and define N = 

A/' t \{n t + 3 , rq + 4 , rq + s}. We need to solve the following systems of equations, 
which amounts to 34 linear equations, 11 non-linear equations and 88 unknowns 
L ,7V ,'^t+40^t+41r..,^t+44,'^t— l,^t — — 6* 

1 : ^£+29 ® tit © n t + 5 © ^£+14 © ^£+16 © ^£+22 © ^£+27 © ^£+33 = 0 

2 • ^£+30 © n t + 1 © ^£+6 © ^£+15 © ^£+17 © ^£+23 © ^£+28 © ^£+34 = 0 

3 I ^£+31 © ^£+2 © ^£+7 © n t + 16 © ^£+18 © ^£+24 © ^£+29 © ^£+35 = 0 

4 : Ut — 15 = ^£ — 11 © ^£+6 © ^£+22 © ^£-6 © ^£+5 © ^£+14 = 0 

5 I Ut— 14 = If- 10 © ^£+7 © ^£+23 © n t- 5 © ^£+6 © ^£+15 = 0 

< 

18 : Ut - 1 = lt + 3 © ^£+20 © ^£+36 © ^£+8 © ^£+19 © ^£+28 = 0 

19 : Ut = lt+ 4 © ^£+21 © ^£+37 © ^£+9 © ^£+20 © ^£+29 = 0 

20 : Ut+l = lt + 5 © ^£+22 © ^£+38 © ^£+10 © ^£+21 © ^£+30 = 0 


, 34 : rq+15 — l t + 19 © ^£+36 © ^£+52 © ^£+24 © ^£+35 © ^£+44 — 0 
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35 : Ut -\~ 40 © h © c t ® — 0 

36 : nt +4i © Zt+ 1 ® c* +1 0 g(7V t+1 ) = 0 


39 : 71t+44 © Zt+4 © Cf + 4 © d(N t4 ~ 4 ) = 0 

40 : Tit — l © Zt— 1 © c t— i © Q f (N*) = 0 

41 : Tit — 2 © k-2 © c 4 — 2 © Q r 1 ) = 0 


k 45 : nt—Q © It— 6 © c t - 6 © 5 ) — 0 

In the following part, L 1 is treated as a column vector of size 40. First of all, we 
choose the 40 equations numbered by 1,2,..., 34 and 40, 41,..., 45 from the above 
systems to represent L t by the unknowns AU t ,7i£+4o,ri£+4i,...,7i£+44,7Z£_i,7Z£_2,..., 
nt-e as M • L l = v , where M is the 40 x 40 coefficient matrix of L t , and v is a 
column vector of size 40, and 

M • L l = [Zt+29, Zt+30, Zt+31, It- 11 © Zt+6 © Zt+22, •••, 

Zt+19 © Zt+36 © Zt+52} It— 1, •••5 — 6 ] 5 

and 

*«[© ieB n t+i~ 1j ^—6 © ^t+5 © ^t+14 5 •••> 

^t+24 © ^£+35 © n £+44> T'H-l © C 4 _! © (/(TV*), ..., 7lt_6 © C 4 _ 6 © g' (N* 5 )] T . 

We have checked that rank(M) = 39. Take l t as a free variable, we obtain an 
invertible coefficient matrix of size 39 x 39. Let L n = L f \{l t }, then each variable 
in L n can be uniquely represented as linear combinations of 7V* t ? 7i£+4o,ri£+4i,..., 
Tk+ 4 A,Tit~t,nt- 2 ,-,Tit -6 and l t , together with 1 non-linear equation with these 
unknowns. Plugging in the values Z t +i ,^t+2 ,^t+3^t+4 in equations numbered by 
36,..., 39, we get a system with 6 non-linear equations and 49 unknowns A/'* t ,n t +4o, 
7q+4i,...,nt+44,n£_i,rq_2,...,n£_5 and l t . Define a set GUESS = {n t +j : j £ S} of 
size 33, where 

S = {-1, 0, 1, 3, 4, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 21, 

23, 24, 25, 26, 28, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41}. 

By guessing the 33 unknowns in the set GUESS, 

- If nt+9 = 0, we come up with 2 32 systems with 6 linear equations and 16 
unknowns; For each of these systems, we do the Gauss elimination once by 
choosing an invertible coefficient matrix of 6 x 6. The systems can be solved 
with 2 32 x (6 3 + 2 10 ) = 2 42,27 basic operations. 

- If n t + 9 = 1, we further guess n t + 8, thus we get 2 33 systems with 6 linear 
equations and 15 unknowns. Similarly, the systems can be solved with 2 33 x 
(6 3 + 2 9 ) = 2 42,51 basic operations. 

In total, the pre-computation is approximately 2 43,39 basic operations. 

We list in Table 2 more instances that illustrate the complexities of the TMD 
tradeoff attacks on Sprout. The Comparison of our TMD tradeoff attacks with 
the previous ones in [8] and [12] are presented in Table 3. With carefully chosen 
attack parameters, our method is at least 2 20 times faster than the attack in 
[12], 2 10 times faster than the attack in [8] with much less memory. 
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Table 2. The complexity issues of the attack on Sprout 


x, y 

Count[x + y) 

Data 

Memory (-bit) , (TB) 

Time 

Pre-computation 

16,14 

59 

2 39 

2 51 - 39 -bit, 336 TB 

240.66 

244.03 

16,15 

61 

2 40 

2 50 ' 63 -bit, 198 TB 

239.66 

243.39 

17,15 

63 

2 41 

2 49 -85_bit, 115 TB 

238.66 

243.81 

17,16 

64 

2 42 

2 49 03 -bit, 65 TB 

237.66 

245.36 

18,16 

65 

2 43 

2 48 - 20 -bit, 36 TB 

236.66 

247.09 


Table 3. Comparison of our time/memory/data Tradeoff attacks with the previous 
ones 


Attack 

Data 

Memory (-bit) , (TB) 

Time 

Pre-computation 

[12] 

112 

> 2 52 ' 32 -bit, > 639 TB 

266.80 

268.87 

[8] 

2 40 

2 52 - 58 -bit, 770 TB 

230.66 

254.29 

[8] 

2 41 

2 52 64 -bit, 399 TB 

229.66 

« 2 56 ' 70 

[8] 

2 42 

2 50 ' 69 -bit, 207 TB 

228.66 

w 2 59 ' 07 

[8] 

2 43 

2 49 ' 74 -bit, 108 TB 

227.66 

« 2 61 ' 42 

ours 

2 39 

2 51 ' 39 -bit, 336 TB 

240.66 

244.03 

ours 

2 40 

2 50 ’ 63 -bit, 198 TB 

239.66 

243.39 

ours 

2 41 

2 49 - 85 -bit, 115 TB 

238.66 

243.81 

ours 

2 42 

2 49 03 -bit, 65 TB 

237.66 

245.36 

ours 

2 43 

2 48 - 20 -bit, 36 TB 

236.66 

247.09 


5 Practical Implementation 

To verify the validity of our attack, we experimentally test it on a reduced cipher 
with similar structure and properties as Sprout. In general, the simulation results 
match well with the theoretical estimates. 


5.1 The Reduced Version of Sprout 


Similarly, there is an 8-bit counter register, of which the lower 6 bits are a 
modulo 40 counter, denoted by (cf , c^, cf , cf , c \ , c°) at a given round t. The 3-th 
LSB cf of the counter is employed in the keystream generation. It should be 
noted that, cf has a cycle of length 40, i.e., in each cycle, this bit takes the 
values 0, 0, ..., 0 1, 1, ..., 1 0, 0, ..., 0 1, 1, ..., 1 0, 0, ..., 0. 


8 8 8 8 8 

The reduced version of Sprout uses a 20-bit LFSR and a 20-bit NFSR. At 
time £, the LFSR state is L l = [Z*, Z*+i, ..., Zt+i 9 ], and it is updated recursively 
by / as Zt+20 = h © h+i ® h+ 14 ® h+15 © ^t+16 © Zt+ig- The NFSR state AT = 
1, ...,nt+ig] is updated recursively by a nonlinear feedback function g as 


580 


B. Zhang and X. Gong 


n t + 20 = kt © Ct © It © giN*) 

— K © Cf © It © n t © n t + 13 © rit+is © n t +i7 © n t + 19 

© nt+2TLt+5 © nt+znt+7 © ^t-\-8^t-\-9 © 7lt+l7lt+14 © ^t+ 16 ^t +18 © nt+Qint +12 
© ^t+13^t+16^t+17^t+18 © TLt+lQnt+lirit +12 © Tit+ATlt+7Tit+ll • 

Let Ut be Ut = /t+i © ^t+4 © ^t+17 © ^t+4 © f'H+io © ^t+i4 5 then 

r = f fet, 0 < t < 39 
* 1 fct(mod 40) • ut, otherwise 

Given the internal state at time £, the keystream bit z t is generated as 
Zt = h(nt+ 4, L+6, L+8, L+10, L+12, L+17, L+19, L+3, Tit +18) © L+10 © (©, G4 nt+ ") ’ 
where A = {1,3,6, 15, 17}, and the filter function h(-) is defined as 

h(-) = nt+4^t+6 © ^t+8^+10 © ^t+12^+17 © ^t+19^+3 © ^+4^+127^+18 • 

During the key/IV setup phase, since the key is fixed, first load the IV in 
the following way: rii = ivi, 0 < i < 19; li = ivi+ 20, 0 < i < 14 and li = 1, 15 < 
i < 18, /19 = 0. Then run the cipher 160 rounds as follows. 

- the LFSR update function is changed to l t + 20 = z t © /(!/*). 

- the NFSR update function is changed to n t +20 = z t © © l t © cf © g(N t ). 

- no keystream bit is generated. 

After the initialization phase, the keystream generation phase starts and there 
is no feedback keystream anymore. 

5.2 Attack Process 

Suppose l t + 4 = 0, ri£+ 3 = n t + 4 = n t+ 5 = 0, by a computer computation, 
there are 1728(> 2 10 ) possible values for the following 13-bit of LFSR such 
that 1 )(.) is 0 (e F|). 

= [it+2 | l^t+3^+7^+8^+9 ||^+10^+11^+12^+13||^+16^+17^+18^+19] © 

For example, P l = 0x0000,0x0001,... We denote all the 1728 values of P l 
as ai,a2,...,aio24r--5 a i728> where the first 1024 values are a\ — 0x0000, a 2 = 
0x000 l,...,aio24 = Oxlbal. For convenience, several notations are defined as 
follows: 

- L* t = L t \({/ t+ 4} U P*) of 6-bit. 

- N** = A©{nt + 3,n t+4 ,n t+5 } of 17-bit. 
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- Define C = [cf_ 5 , cf _ l5 cf , cf + 1 , cf +5 ] of length 11 , the employed counter 
array. There are 21 different counter arrays, denoted by hexadecimal numbers, 
they are 

0x007, OxOOf , OxOlf , 0x03f , 0x07f , OxOf f , Oxlf e, 

0x3f c, 0x7f 8, 0x7f 0, 0x7e0, 0x7c0, 0x780, 0x700, 

0x601, 0x403, 0x600, 0x400, 0x000, 0x001, 0x003, 

- Define C' = of length 1 , and C* = [cf_ 5 , c?_ 2 , c?, c? + 1 , c? +5 ] of 

length 10 . There are 2 different values for C", denoted as c[ = 0 x 0 , c' 2 = 0 x 1 . 
If cf_ x = 0x0, there are 13 different values for C*, they are 0x007, OxOOf, 
OxOlf ,0x03f,0x3c0, 0x380, 0x30 1,0x203, 0x300, 0x200, 0x000, 0x001, 0x003; If 
cf_ x = 0 x 1 , there are 8 different values for (7*, they are 0x03f ,0x07f , 0 x 0 f e, 
Oxlf c,0x3f 8,0x3f 0,0x3e0,0x3c0. 

Pre-processing Phase. For any state (. L t ,iV t ), suppose l t + 4 = 0, n t +3 = 
nt _|_ 4 = = 0. In the pre-processing phase, we construct 2 x 2 10 tables 

T c y a . indexed with c'- and a$, for = 0 x 0 , cj> = 0 x 1 and a\ = 0 x 0000 , a 2 = 

0 x 0001,. ..,01024 = Oxlbal. In Table T c y a ., 23-bit (L* t ,7V* t ) are stored in the 
first column of a row such that 

) P t — [lt+2 I |^t+3^t+7^t+8^t+9 I l^t+10^+11^+12^+13 | |/t+16^t+17^t+18^t+19] = CLi 

h+io+i 0 nt+i+i 0 nt+ 3+1 0 nt+6+z 0 ^t+ 15 +i 0 rit+ 17 +i — 0, i = —1, 0, 1 
Ut+j — lt+i+j 0 h+4 +j 0 lt+ 17 +j 0 n.t+4+j 0 nt+io+j 0 nt+14+j = 0, j = 0, 1, ..., 5 
Ut-k — lt+l-k 0 lt+A-k 0 lt+17-k 0 Tlt+4—k 0 nt+10-k 0 nt+14-k = 0, k = 1,2, ..., 5 
k rit-i 0 It - 1 0 Ct-i 0 g'^N 1 * * * ) — 0 ( non — linear) 

Similarly, we can solve all the systems by choosing a set of unknowns as 
GUESS = {nt,nt+i,nt+ 6 ,nt+ 7 ,nt+io,nt+ii,nt+i 5 ,nt+i 6 ,n t +i 7 } with approxi- 
mately 2 23 basic operations. Besides, for each (c'-,a^) pair, there are expected 
2 9 solutions, we store the 23-bit (L* t ,7V* t ) of the internal state (4+13=17- 
bit are fixed for each table) in the first column of a row in table T c y a . . 
Further for this state and for each possible round constants C*, get the cor- 
responding 11 -bit output (zt~6, Zt- 2 , •••5 z t+7) and put them in the sec- 

ond column as a sub-row indexed by C*. The number of sub-row is 13 for 
c[ = 0x0, while the number is 8 for <4 = 0x1. In total, the memory needed 
is M = 2 10 x 2 9 x (23 + 11 x 13) + 2 10 x 2 9 x (23 + 11 x 8) « 2 27 11 -bit, i.e., 
17.3 MB 1 . 

Next, we present the State Checking and Key Recovery Mechanism specified 
for the reduced version of Sprout, which is similar to the one stated for Sprout. 

State Checking and Key Recovery Mechanism. For a candidate state at 
time t, L l = [l t ,h+ 1, ..., Z t +i 9 ], N* = [n u n t +i, ..., n t +i 9 ], create a 40-bit vector 
K for the possible values associated with it: 


1 Since each table is expected to have 2 9 rows, we can only store 2 output bits in the 

second column of each row, indexed by 9 bits of the output. Thus, the memory can 

be reduced to 2 10 x 2 9 x (23 + 2 x 13) + 2 10 x 2 9 x (23 + 2 x 8) » 2 25 46 -bit, i.e., 

5.5 MB. 
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1. Compute the value of n t -\ given by the key stream bit z t - 2 as = z t - 2 ® 

h+A, ^+6^t+8, h+lChh+lbdt+l7dt+l, ^t+16) 0 h+8 0 (© ieA' n t+i- 2 ) 
where A' = {3, 6, 15, 17}. And compute l t - 1 by the LFSR updating equation 
as l t _1 = /t+19 0 it+18 0 it+15 0 it+14 0 it+13 0 k, and deduce from n t _i, l t - % 
the value by the NFSR updating equation as k^_ 1 = n t +ig0cf_ 1 0/t-i 0 

2. Compute the value of iq_i = h 0 /t+3 0 /t+16 0 ^t +3 0 n t + 9 0 n t +i 3 and 
combine it with the value of k^_ 1 obtained in Step 1: 

- if u t - 1 = 0 and k^_ t = 0, set t — > £ — 1 and go back to Step 1. 

- if rq_i = 0 and k^_ t = 1, there is a contradiction, conclude that this guess 
for state is not correct and stop. 

- if u t - 1 = 1 and k^_ ± = 0, check if &q_i) mo d 40 has already been set in K. 
If no, set it to 0. Set t — > t — 1 and go back to Step 1. Else, if there is a 
contradiction, conclude that this guess for state is not correct and stop. 

- if u t - 1 = 1 and k^_ t = 1, check if &p-i) mod 40 has already been set in K. 
If no, set it to 1. Set t — > t — 1 and go back to Step 1. Else, if there is a 
contradiction, conclude that this guess for state is not correct and stop. 

By utilizing the pre-computed tables and the given keystream sample, the 
processing phase is carried out as follows. 

The Internal State Recovery Algorithm. Given the 2 x 2 10 tables T c / a . , 
and the keystream sample {z t }t > o having at least 2 21 sample segments, the 
processing steps are as follows: 

1. Search the keystream sequence {z t }t > 6 for the next non-considered block of 
3 zeros, i.e., zt-iz t z t +i = 000. If there are no more blocks, output a flag that 
the algorithm has failed. 

2. For each detected block, compute the corresponding C' = [cf_ x ] = d and 

C * = [cf_ 5 , ..., cf_ 2 , cf , cf +1 , ..., cf +5 ] = c* from the time t. For cq = 0x0000, 
d 2 = 0x0001,. ..,aio 24 = Oxlbal, compare (^h- 2 z t+ 3 --- z t+ 7 ) after the zero- 
segment and (zt- 6 Zt- 5 ...zt~ 2 ) before the zero-segment with the memorized 
11-bit segments in the second column of a sub-row indexed by c* from the 
tables T c / ja ., and do the following: 

- If the matching does not exist, go to the processing Step 1. 

- If the 11-bit sample segments match with a segment in table T c / ?a . , go to 
Step 3. 

3. Read the corresponding state, check whether it is a correct state or not and 
recover the secret key by the State Checking and Key Recovery Mechanism 
stated before. If this state survives, recover and output the key, else go to 
Step 1. 


5.3 Simulation Results 

Our attacks have been fully implemented on one core of a single PC, running 
with Windows 7, Intel Core i3-2120 CPU @ 3.30 GHz and 4.00 GB RAM. In 
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general, the experimental results match the theoretical analysis quite well. We 
present the details as follows. 

In our experiment, first of all, we constructed 2 x 2 10 tables indexed by 
(c!j , di) pairs for c[ = 0x0, c 2 = 0x1 and ai = 0x0000, a 2 = 0x0001,. ..,<21024 = 
Oxlbal, storing the special internal states. We used 2 x 2 10 text files to store the 
(State, Key streami, key stream 2 , key stream count (\c*\)) tuples named with 
the corresponding c'- and a^. Note that count(\C *\ ) = 13 for c[ = 0x0 and 
count (\C*\) = 8 for c' 2 = 0x1. Experimental results show that there are 496 or 
504 or 520 or 528 rows in each table, and totally 524448 2 19 ) rows for c[ = 0x0 
, 524128(~ 2 19 ) rows for c 2 = 0x1. Thus the memory needed in the simulation 
is 524448 x (23 + 11 x 13) + 524128 x (23 + 11 x 8) « 2 27 - n -bit, Le., 17.3 MB, 
which matches the theoretical estimate quite well. 

For the key recovery algorithm illustrated above, the data complexity is 
estimated by the probability that an internal state (I/, TV*) is a special state 
satisfying: 

(1) h + 4 = 0, nt + 3 = nt+4 = ^t+5 = 0, 

(2) P l = ai or P l = a 2 or ... P l = < 21024 , 

(3) Zt+io+d © (©ieA n t+i+d) = 0 for d = —1, 0, 1, 

(4) u t+j =0, for j = 0,1, .. .,5, 

(5) u t _ fe s*0,forfc = l,2,...,5, 

Thus the theoretical estimate is D = 2 21 . In the experiment, we used the RC4 
cipher to randomly generate 2 15 ( K,IV ) pairs and for each randomly chosen 
(. K,IV ) pair, we ran the cipher and generated 2 21 keystream bits. Results show 
that we can get a special state at time t < 2 21 for 20423(^ 2 14,32 ) ( K , IV) pairs. 
For example, suppose ( K,IV ) pair be 

K = 1010100101011001101010110010011000110110 
IV = 11010101101001001110100110010111011 

where the left-most bit represents the value for index 0. At time t = 580697(~ 
2 1914 ), a special state arises in Table T c / ?ai40 , where d — 0x0 and ai 4 o = 0x0191, 
such that (1)(3)(4)(5) hold and P l — 0x0191. This internal state is 

1} = 11110000010001110000 
N l = 00100000011011010110 

In the internal state and key recovery algorithm, we search the keystream 
sequence for the 3 zeros blocks, and for each block, we try to find matching 
pairs, and further recover the key. In the experiment, we first searched the given 
keystream sequence and collected the time instances t implying 3 zeros. The 
expected number of such instances is 2 21 x 2 -3 = 2 18 . Besides, for each 11-bit 
output, the expected number of candidate states is |it = 2 8 producing this 
output. Thus we go through all the time instances, and for each time instance, 
we go through all the candidate states. We have also verified by experiments 
that 4 more clocks of output is enough for checking the validity of the state and 
the recovery of the key bits for each candidate. In total, the estimate of the time 
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complexity is 2 18 x 2 8 x 4 = 2 28 . In the simulation, for the ( K,IV ) pair above, 
we have recovered all the key bits within 1 hour. 

6 Conclusion 

In this paper, we have studied the security of Sprout-like stream ciphers in a 
unified framework from the viewpoint of ^-normality of the augmented function. 
We made a systematic security analysis based on this property and developed 
a dedicated TMD tradeoff attack framework for such designs. In particular, it 
is shown that Sprout can be broken by various TMD tradeoffs. Our attack is 
highly flexible and compares favorably to all the previous attacks on Sprout, 
which demonstrates the superiority of the new method. We believe that stream 
ciphers with shorter internal state may suffer from the time/memory /data trade- 
off attacks and the ^-normality of the augmented function should be taken into 
account for new stream cipher designs. 
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Abstract. In May 2012, a highly advanced malware for espionage 
dubbed Flame was found targeting the Middle-East. As it turned out, 
it used a forged signature to infect Windows machines by MITM-ing 
Windows Update. Using counter-cryptanalysis, Stevens found that the 
forged signature was made possible by a chosen-prefix attack on MD5 
[25]. He uncovered some details that prove that this attack differs from 
collision attacks in the public literature, yet many questions about tech- 
niques and complexity remained unanswered. 

In this paper, we demonstrate that significantly more information can 
be deduced from the example collision. Namely, that these details are 
actually sufficient to reconstruct the collision attack to a great extent 
using some weak logical assumptions. In particular, we contribute an 
analysis of the differential path family for each of the four near-collision 
blocks, the chaining value differences elimination procedure and a com- 
plexity analysis of the near-collision block attacks and the associated 
birthday search for various parameter choices. Furthermore, we were able 
to prove a lower-bound for the attack’s complexity. 

This reverse-engineering of a non-academic cryptanalytic attack 
exploited in the real world seems to be without precedent. As it allegedly 
was developed by some nation-state(s) [11,12,19], we discuss potential 
insights to their cryptanalytic knowledge and capabilities. 


Keywords: MD5 • Hash function • Cryptanalysis • Reverse 
engineering • Signature forgery 


1 Introduction 

1.1 End-of-Life of a Cryptographic Primitive 

The end-of-life of a widely-used cryptographic primitive is an uncommon event, 
preferably orchestrated in an organized fashion by replacing it with a next gener- 
ation primitive as a precaution as soon as any kind of weakness has been exposed. 
Occasionally such idealistic precautions are thrown to the wind for various rea- 
sons. Unfortunately, the sudden introduction of practical attacks may then seri- 
ously reduce the security of systems protected by the cryptographic primitive. 
The ensuing forced mitigation efforts need to overcome important hurdles in a 
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short amount of time and thus prove to be less successful than precautionary 
mitigation efforts. The topic of this paper, namely an exposed cryptanalytic 
attack on the hash function MD5 exploited in the real-world eight years after 
the first practical break of MD5, is a recent example of the above. 


1.2 Collisions for MD5 

The cryptographic hash function MD5 found widespread use for many years 
since its inception in 1991 by Ron Rivest [21]. It became the de facto industry 
standard in combination with RSA to generate digital signatures upon which our 
Internet’s Public Key Infrastructure (PKI) for TLS/SSL has been build. This 
despite early collision attacks on the compression function of MD5 by den Boer 
and Bosselaers [2] and Dobbertin [6]. 

That changed after in 2004 the first real MD5 collision attack, as well as 
example collisions, were presented by Wang et al. in a major breakthrough in 
hash function cryptanalysis [28,29]. Improvements to their attack were pub- 
lished in a series of papers (e.g., see [9,10,13,22,24,27,30,31]). Unfortunately, 
no convincing threatening scenarios arose due to the important restriction that 
colliding message pairs M = P\\C\\S, M' = P||C"||*S can only differ in the 
random- looking ( 7 , C' . 

This restriction was lifted with the introduction of the first chosen-prefix 
collision attack on MD5 [26] that for any two equal-length prefixes P and P' 
constructs short random-looking C and C' such that PlICII*? and P'\\C'\\S col- 
lide for any common suffix S. Chosen-prefix collisions make it significantly easier 
to construct collisions with meaningful differences, i.e., often it suffices to choose 
M and M' appropriately and to hide C and C’ somewhere within the messages. 
It enabled the first truly convincing attack scenario using MD5 collisions, namely 
the construction of a rogue Certificate Authority (CA) certificate presented in 
2009 [27]. As it turned out, many CAs had voluntarily stopped using MD5. Nev- 
ertheless, the remaining few MD5-using CAs endangered the entire PKI as any 
PKI is only as strong as its weakest link, i.e., CA. 

Based on these developments, various authorities explicitly disallowed MD5 
in digital signatures (e.g., The C A/Browser Forum adopted Baseline Require- 
ments for CAs in 2011 1 , Microsoft updated its Root CA Program in 2009 2 ). 


1.2.1 Counter-Cryptanalysis 

Due to its widespread and pervasive use, MD5 remains supported to accommo- 
date old signatures even up to the time of this writing. Any party world- wide 
still signing with an MD5-based digital signature scheme - against all advice - 
may be attacked using a chosen-prefix collision attack. Furthermore, a resulting 
digital signature forgery can be exploited against nearly everyone due to the 
near- ubiquitous support of MD5-based signatures. Stevens recently proposed 


1 https: / / cabfomm.org/ wp- content / uploads /Baseline_Requirements_V 1 .pdf 

2 http : / / 1 echnet . microsoft . com /en- us / library /cc751157. aspx 
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to counter these threats using counter- cryptanalysis [25], specifically a collision 
detection algorithm, i.e., an algorithm that asserts whether any given single 
message belongs to a colliding message pair that was constructed using a MD5 
and/or SHA-1 collision attack. The main idea is to guess the colliding part (i.e., 
the C') of the assumed sibling colliding message and to verify whether an internal 
collision occurs. Once a collision has been verified, one knows the near-collision 
blocks for both messages, however, one cannot reconstruct earlier parts of the 
missing message with counter-cryptanalysis. 

Collision detection can strengthen digital signatures by invalidating forged 
digital signatures, thereby allowing the continued secure use of MD5-based sig- 
natures. However, collision detection is clearly not a permanent solution and 
cannot replace proper migration to the more secure SHA-2 and SHA-3. 

1.3 The Super-Malware ‘Flame’ 

1.3.1 Background 

Flame is a highly advanced malware for espionage and was discovered in May 
2012 by the Iranian CERT, Kaspersky Lab and CrySyS Lab [11,12]. It seemed 
to have targeted the Middle-East, with the most infections in Iran. Among the 
targets were government-related organizations, private companies, educational 
institutions as well as specific individuals. According to these reports by malware 
experts Flame was developed by some nation-state(s) with near-certainty. It 
seems the best report so far on the origin is a Washington Post article reporting 
that - according to unnamed officials and experts - Flame was a joint U.S. -Israel 
classified effort [19]. 

For espionage, Flame collected keyboard inputs, Skype conversations and 
local documents of potential interest. It could also record screen contents, micro- 
phone audio, webcam video as well as network traffic, sometimes triggered by 
the use of specific applications of interest like Instant Messaging applications. 

According to Kaspersky [12], Flame was active since at least 2010. How- 
ever, CrySyS Lab reports Flame or a preliminary version thereof may have been 
active since 2007 due to an observed file in the security enterprise webroot in 
2007. Infections seem to have occurred with surgical precision with each target 
carefully selected instead of wildly spreading, which may be one of the reasons 
why it has evaded discovery for several years. 

We refer to the analyses by Kaspersky Lab and CrySyS Lab [11,12] for more 
details on the functionality, purpose and origin of Flame. Here we focus on the 
variant chosen-prefix collision attack that enabled its propagation. 

1.3.2 Propagation 

As described by Sotirov [23], Flame used WPAD (Web Proxy Auto-Discovery 
Protocol) to register itself as a proxy for the domain update.windows.com to 
launch Man-In-The-Middle attacks for Windows Update on other computers on 
the local network. By forcing a fall-back from the secure HTTPS protocol to 
the insecure HTTP protocol, Flame was able to push validly signed Windows 
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Update patches of its choice. This included a properly, but illegitimately, signed 
Windows Update patch by which Flame could spread to other machines. Flame’s 
code-signing certificate for this security patch was obtained by fooling a certain 
part of Microsoft’s PKI into signing a colliding - innocuous- looking - sibling 
certificate using an MD5-based signature algorithm. As the to-be-signed parts 
of both certificates were carefully crafted to result in the same MD5-hash using 
a chosen-prefix collision attack, the MD5-based signature was valid for both 
certificates. 

Even though Microsoft was fully aware of the severe weaknesses of MD5 and 
spent great effort on migrating to more secure hash functions for new digital 
signatures at least since 2008, their software continued to accept (old) MD5- 
based digital signatures. Unfortunately, the use of MD5-based signatures for 
licensing purposes in their Terminal Server Licensing Service was overlooked in 
their efforts. 3 This, together with other unforeseen circumstances, allowed the 
creation of Flame’s properly, but illegitimately, signed security patch that was 
trusted by all versions of Windows [16]. 4 

1.3.3 Unknown Variant Chosen-Prefix Collision Attack 

On the 3rd of June 2012, Microsoft blogged that in their initial analysis of Flame 
they “ identified that an older cryptography algorithm could be exploited and then 
be used to sign code as if it originated from Microsoft ” [IT]. An immediate guess 
was that this cryptically worded statement refers to the construction of a rogue 
code-signing certificate using a chosen-prefix collision attack on MD5 similar 
to [27]. Only the certificates in the chain leading to the forged signature on 
Flame’s executable were circulating on the Internet [20], its sibling colliding 
certificate remains lost. Using his collision detection technique, Stevens was able 
to reconstruct the collision part of the missing sibling colliding certificate [25] . 

Having both colliding parts one can observe the differential paths used for this 
attack which Stevens uses to provide a preliminary analysis of Flame’s attack: 

Flame’s differential paths clearly show a chosen-prefix collision attack that 
starts with a chaining-value difference containing many bit differences that is 
gradually reduced to zero by the four sequential “near-collision” block pairs. 
However, these differential paths do not match any family of published chosen- 
prefix collision attacks [27], but instead were variants based on the first differen- 
tial paths for MD5 by Wang et al.[29]. Also, they show characteristics that do not 
match those from known differential path construction methods for MD5. The 
author provides arguments indicating an unnecessary costly differential path 
construction method was used. Furthermore, experimental results were given 
constructing replacement paths with significantly fewer bitconditions in only 
about 15 s on average on a single Intel i7-2600 CPU (equivalent to about 2 29 
MD5 compressions). 


3 Microsoft invalidated this part of their PKI after the discovery of Flame in 2012. 

4 Any license certificate produced by the Terminal Server Licensing Service could 
directly be used to attack Windows Vista and earlier versions, but not later versions. 
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Based on the differential paths and the observation that the best known 
message modification technique was used, for each block a lower bound for the 
average complexity to find the near-collision blocks is given. Note the implicit 
assumption that the differential path including the target output chaining value 
difference is fixed before the near-collision block search. 

Based on the weight of the observed chaining value difference after the 
birthday search that need to be eliminated by the four near-collision attacks, 
an indicative complexity estimate of about 2 42 MD5 compressions is given. 
Although further constraints make it more likely to be even higher instead of 
lower. Lacking a more detailed analysis of the chaining value difference elimina- 
tion strategy, no more accurate prediction could be given. 

Although Stevens was able to show a yet unknown variant attack was used, so 
far, no reconstruction of Flame’s attack has been presented and many questions 
regarding techniques and complexities remained unanswered. Specifically there 
is no analysis so far for the possible differential path family for each block, and 
therefore for the chaining value reduction procedure that selects which chaining 
value differences (the tail of the differential path) to eliminate in each block. 
This in turn makes it hard to provide accurate complexity estimates for each 
of the four near-collision attacks as well as for the associated birthday attack. 
Furthermore, the work in this paper makes it clear that Stevens’ assumption that 
each near-collision block targets a specific chaining value difference is inaccurate, 
making his preliminary comments on the attack complexity incorrect. 

2 Our Contributions 

In [25] Stevens presented proof that Flame uses a yet unknown chosen-prefix 
collision attack and made indications of the complexity to find solutions for the 
recovered differential paths. No attack reconstruction or more accurate complex- 
ity estimates were given. 

Our paper is entirely based on the four near-collision block pairs shown 
in Appendix B that can be recovered from the single available certificate in 
Flame’s attack using counter-cryptanalysis. This paper significantly improves 
upon Stevens’ preliminary reconstruction and we demonstrate for the first time 
that a single example of a collision pair is actually sufficient to reconstruct the 
used collision attack to a great extent under some weak logical assumptions. 
Furthermore, the high level of detail of our reconstruction even admits concrete 
conclusions under a complexity analysis, specifically we prove a lower-bound for 
the estimated attack complexity and provide a cost figure for the closest fit of 
attack parameters. Our work shows that Stevens’ indications of the near-collision 
costs are not the real expected costs. In particular the attack does not use fixed 
differential paths, but allows some random chaining value differences to occur 
in the first two blocks that can be efficiently negated in the last two blocks. 
Lacking more information about the near-collision attack procedures, Stevens 
was also unable to give real indications of the birthday search complexity of 
Flame’s chosen-prefix collision attack. However, our reconstruction as well as our 
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complexity analysis includes the birthday search and shows there is a trade-off 
between the birthday search cost and the total cost of the near-collision attacks. 

At a high level we can draw some insights from our analysis into the cryptan- 
alytic capabilities and the available resources of Flame’s creators. In particular, 
the complexity for the closest fit of attack parameters is equivalent to 2 49,3 MD5 
compressions which takes roughly 40,000 CPUcore hours. That means for say 
3-day attempts to succeed in reasonable time given the large number of required 
attempts, one needs about 560 CPUcores, which is large but not unreasonable 
even for academic research groups. With an estimated complexity of 2 44,55 MD5 
compressions from [27], this seems to be suboptimal. Not only the overall com- 
plexity seems to be suboptimal, also the differential path construction method 
and the near-collision speed-up techniques seem to be suboptimal. Overall we 
can report that it is clear that significant expertize in cryptanalysis was required, 
yet there are no indications at all of superior techniques, but instead that vari- 
ous parts are sub-optimal. It seems a working attack that succeeds in reasonable 
time was more important than optimizing the overall attack using all of the state 
of the art techniques 5 . 

Noteworthy is the following thought by an anonymous reviewer: developing 
a new variant attack required significant human effort which would have been 
unnecessary if its creators had enough computational power to do a general 
birthday search of complexity 2 64,85 MD5 compressions in reasonable time. This 
may indicate a reasonable upper bound on available resources. Although, given 
the public availability of the Hashclash tools [8] since mid 2009, it might have 
been unnecessary in the first place which would imply they explicitly chose to 
build their attack or use their already built attack for Flame for some reasons. 

At a more detailed level, our analysis revealed that a central idea behind 
the attack seems to be that the near-collision blocks operate in pairs : The first 
two blocks together eliminate one part of the intermediate hash value differen- 
tial, allowing mostly random changes to other parts. The remaining differences 
(including the random changes from the first pair) are eliminated by the second 
two blocks. This idea allows a significant reduction in the expected complexity 
compared to the previous estimate by Stevens [25], where each near-collision 
pair was assumed to target specific intermediate hash value differences. 

We have deduced the most likely parametrized family of differential paths 
for each near-collision block from which one is selected to eliminate specific 
intermediate hash value differences, as well as the complementary parametrized 
birthday search procedure that results in an intermediate hash value difference 
that can be eliminated using the 4 families of differential paths. We provide 
a complexity analysis for plausible parameter choices. Furthermore, we prove 
Theorem 6 stating a lower-bound complexity independent of parameter choices 
to be 2 46,6 calls to the compression function in Sect. 4.3.3, and provide parameter 


5 At the time there seems to have been no reason to hold back more advanced tech- 
niques, given that counter-cryptanalysis was not publicly known then. Also, if there 
was a concern about revealing their knowledge then they could have easily used the 
publicly available Hashclash tools [8] instead. 
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choices that achieve this cost. Sotirov estimated that obtaining their forgery 
was significantly more difficult than the original Rogue CA construction, thus 
requiring many collisions in order to succeed [23] . This indicates that significant 
computational resources need to have been brought to bear to execute each 
chosen-prefix collision attack in a relatively short amount of time in order to 
succeed in their overall aim to obtain a forgery. 

Lacking more examples or other hints about the actual attack procedure, it 
seems to be very hard to determine more specifics of Flame’s chosen-prefix col- 
lision attack with any significant level of certainty. This includes the differential 
path construction algorithm and the collision search algorithm. For more details 
and analysis of less important aspects to our complexity analysis we refer to the 
full version of this paper. 

The remainder of this paper is as follows. We start in Sect. 3 with an exposi- 
tion of the main known techniques for chosen-prefix collision attacks. In Sect. 4.1, 
we break down the data from the recovered near-collision block pairs. We present 
our reconstruction in Sect. 4.2 and its complexity analysis in Sect. 4.3. 

3 MD5 Chosen-Prefix Collision Attacks 

3.1 MD5 

The hash function MD5 maps an arbitrarily long input message M to a 128-bit 
output string. Its design follows the Merkle-Damgard construction [5,15], using 
a compression function which we call MD5Compress and a chaining value denoted 
IHV. 

1. Unambiguously pad M to a length that is a multiple of 512. 

2. For i = 0, . . . , N — 1, let Mi denote the ith 512-bit block of M. Let 

IHV 0 = IV = (67452301i6, efcdab89i 6 ,98badcfei 6 , 10325476 i 6 ) 

3. For i = 1, . . . , TV, let IHVi = MD5Compress(/i7I^_i, 

4. Output IHVn converted back from little-endian representation. 

The description of MD5Compress we give here is not the standard one but an 
equivalent “unrolled” formulation [7] that is better suited for cryptanalysis. The 
compression function has 64 steps and computes a sequence of working states 
Q t for inputs IHV in G {0, l} 128 , M e {0, l} 512 : 

1. Split IHV[ n and M into 32-bit words; IHV\ n = a||6||c||d, M = mo\\ . . . ||mi5 

2. Let Q- 3 = a, Q-2 = d, Q-i = c and Qo = b. 

3. For t — 0, . . . , 63, compute 

Ft = ft{Qt ? Qt-i, Qt- 2 ); T t =F t + Qts + ACt + Wt\ 

R t = RL(T t ,RC t ); Q1+1 = Qt + Ru 

4. Output IHV ou t = (Q 6 i + a, Q 6 4 + 6, Q 6 3 + c, Q 62 + d ). 

where AC t = |_2 32 • | sin(£ + 1) | J and W tl ft(X, Y, Z) and RC t are given by: 
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Step 

W t 

MX, Y,Z) 

RCt 

0 < t < 16 
16 < t < 32 
32 < t < 48 
48 < t < 64 

m t 

^(1+5 1) mod 16 
^(5+3 1) mod 16 
^(7 1) mod 16 

{X A Y) ® {X A Z) 

\z Ai)e(lAy) 

X®Y®Z 

y®(ivz) 

(7,12,17, 22) [t mod4] 
(5, 9, 14, 20) mod 4] 
(4, 11, 16, 23) [t mod4] 
(6,10,15,2% mod4] 


3.2 General Approach 

When constructing a chosen-prefix collision pair P\\C\\S a ny and P' \ \ C' \ \ S an y for 
given prefixes P and P' and arbitrary suffix S any , we may assume without loss of 
generality that P and P' are of equal length and that their length is a multiple of 
the MD5 message block size. (Otherwise, one can just add padding.) A chosen- 
prefix collision attack consists of two stages. The first is the Birthday Search 
where one searches for equal-length suffixes Sb and S b such that the difference 
in the intermediate hash value after processing P\\Sb and P'\\S' b has a particular 
form necessary for the second stage. In the second stage, one constructs near- 
collision block pairs (Si,S[), (S2, S ' 2 ), . . . , (S n , S' n ) such that after processing 
P||Sfc||Si|| ... || S n and P'||S£||Si|| ... ||S^ the intermediate hash values are equal. 
Thus one has found the desired C = S^||Si|| . . . \\S n and C’ = S^ll^ill . . . ||S^ for 
which the pair P\\C\\S any and P'\\C'\\S any form a collision for any suffix S any . 
We explain the construction of the near-collision block pairs below. 


3.3 Differential Cryptanalysis 

Differential cryptanalysis is based on the analysis of the propagation of input 
differences throughout a cryptosystem. This technique was publicly introduced 
in 1993 by Eli Biham and Adi Shamir who first applied it to block ciphers [1]. 
Differential cryptanalysis of hash functions has been very successful. One of the 
key techniques introduced by Wang et al. against MD5 was the simultaneous 
use of the difference modulo 2 32 and the bitwise XOR difference resulting in a 
bitwise signed difference. 

Let I and I' be two different inputs, for any variable X involved in the 
computation for input /, we denote the respective variable for input /' as X' . 
For X, X' G Z 2 32 , we denote by SX = X' — X mod 2 32 the arithmetic differ- 
ential. When it is necessary to keep track of the bitwise differences as well, we 
use the Binary Signed Digit Representation (BSDR). The BSDR differential is 
(AX[i])i= o,...,3i where AA[i] = X'[i\ — X[i\ G { — 1,0, 1}. We can easily calculate 
the arithmetic difference from the BSDR: SX = JT AX[i\ • 2 l mod 2 32 . 

For a BSDR AX, we define the weight w(AX) as the number of indices 
i where AX[i] ^ 0. For SX ^ 0, there are multiple BSDRs AX such that 
SX = ’ 2\ However, there is a normal form, called the non-adjacent 

form (NAF). The non-adjacent form of SX is the unique BSDR AX such that 
Y^Lo AX[i] • 2 l = SX, AX' [31] > 0 and AX has no adjacent non-zero entries. 
The NAF is a minimal- weight BSDR of SX. We define the N AF -weight w(SX) 
as the weight of the NAF of SX. 
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3.4 Differential Paths 

A differential path is an exact description of how differences propage through two 
related evaluations of MD5Compress. In particular, a differential path describes 
for every step t the differences SQts, Z\Qt-2, AQ t - 1, AF t , SWt , dT tj SR t and 
SQt+i such that: 

- 5T t **5Qt-3 + <T(AF t ) + 5W t ] 

~ SQt+i = cr(AQ t ) + SR t ; 

- Pr[AF t \AQ t _ 2 , AQ t _i, AQ t ] >0; 

- Pr[SRt\5T t ] > 0. 

We say that an input pair (IHV,mo\\ . . . Hmis), (IHV',m f 0 \\ . . . || m' 15 ) for 
MD5Compress solves the differential path up to step t if differences for the message 
block and the intermediate variables are as specified in the differential path up to 
step t. 

Although the first differential paths for MD5 were constructed entirely by 
hand [29], two quite different ways to construct differential paths have since 
been introduced: Stevens’ meet-in-the-middle approach [26] and De Canniere 
and Rechberger’s coding-theory based technique [4,14]. 

Suppose a pair of inputs solves a differential path up to some step. This pair 
of inputs might fail to solve the next step because of the Boolean function or 
because of the bit rotation. To handle the Boolean functions, bit conditions are 
used that allow efficient checks whether our inputs have the correct values for 
AF t . The rotations are taken care of probabilistically. 

3.5 Tunnels 

Message modification, specifically Tunneling [10], is an important technique that 
can drastically speed up collision attacks. Under some preconditions, a tunnel 
allows us to change a certain working state bit Q t [i\ and corresponding message 
bits without affecting 1 , . . . , Qt' for some t' > t. As an example, consider the 
most important known tunnel % with the following requirements: 

- Qg[i\ is free, i.e., no difference and no boolean function bitcondition 

- QioH = QioH = 0, and Q' n [i] = Qn[i\ = 1 

Under these conditions, we can flip bits Qg[i\ = Qg[i] and adjust mg, mg and 
mi2 without affecting Qio, • • • , Q24 and Q' 10 , . . . , Q24. 

To see why 7g is useful, suppose that we have a differential path and a 
partial solution thereof up to and including Q24- We say that a bit-position 
i G {0, . . . , 31} is active for 7g if it satisfies the requirements. We call the number 
k of active bit-positions the strength of %. The tunnel allows us to generate 2 k 
different partial solution up to Q24 — one for each possible value of the active 
bit-positions. Since the probability that a partial solution can be extended to a 
full solution is rather small, cheaply generating many partial solutions reduces 
the cost of the collision attack significantly. In Table 1, we describe the three 
tunnels that are the most relevant for the Flame collision attack. In Sect. 4.1.3, 
we discuss how these tunnels might have been used in the collision attack. 
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Table 1. Most important tunnels for MD5Compress 


Tunnel 

Flip bit 

Aux. bitconditions 

Affected states 

Affected message words 

% 

Qd[b\ 

Qio[b] — Qn [5] — 1 

Q 22 , • ■ 

• • , Qq 4 

m 8 , m 9 , mio, mi2 

r 5 

Qio[b ] 

Qn[b] = 0 

Q 22 , • • 

■ ■ , Q6 4 

ra 9 ,raio,rai2,rai 3 

T 8 

Qslb] 

Qio [b\ = 0, Qn [b\ = 1 

Q 25, • • 

■ ■ , Q 64 

m 8 , m 9 , mi 2 


4 Reverse-Engineering Flame’s Attack 

4.1 Breakdown of Data 

In Appendix B we list the chaining values and near-collision blocks from the 
available Flame certificate and the ones for the associated ‘legitimate’ certifi- 
cate that can be recovered using counter-cryptanalysis. The differential path for 
each near-collision block pair can directly be observed by comparing the two 
compression function computations. In this section we first list several specific 
observations about these (reconstructed) Flame near-collision blocks and the 
observed differential paths that are relevant to our reconstruction. 


4.1.1 Some Features of the Near-Collision Blocks 
Observation 1 ([23]). Due the constrained space where the near- collision blocks 
were to be hidden in the certificate, the collision attack could only use four near- 
collision blocks. 

Observation 2. Blocks 1 and 3 of the Flame collision attack use the message 
block differences from the first differential path of Wang et al. ’ s identical-prefix 
attack, Sm^ = 5mi4 = 2 31 , 5m n = 2 15 and 5mi = 0 for i 4, 11, 14. Blocks 2 
and f use the differences from the second differential path of the identical prefix 
attack, Sm 4 = 5m\4 = 2 31 , 5mn = — 2 15 and 5mi = 0 for i 4, 11, 14. 

Observation 3. The working state differences AQq are maximal in all four 
near- collision blocks, i.e., for every i = 0, . . . , 31, we have AQq[i] 0. The AQq 
of Blocks 1 and 3 are equal, likewise for Blocks 2 and 4 • 

Observation 4. The four blocks all have a common structure: Up to and includ- 
ing step 5, the differences 5Q t vary among all four blocks. Then, there is a 
maximal difference in step 6. After that, the values for AQ t and AF t are mostly 
identical in the first and third and in Blocks 2 and 4 , leading up to long sequences 
of trivial steps. The final five steps again differ greatly among all four blocks. 


4.1.2 Notes on Differential Path Construction 

From this last observation, we conclude that a differential path beginning based 
on the input IHV s and a differential path ending were generated separately and 
then combined. Such differential path construction can be done for MD5 using 
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Stevens’ meet-in-the-middle approach [26] or De Canniere and Rechberger’s 
coding-theory based technique [4, 14] . The latter technique is less likely to have 
been used, since all observed differential paths don’t show its characteristic very 
long carry chains over the no n-predeter mined part Qi, . . . , Qs- Stevens already 
showed that suitable differential paths can be constructed in about 15 s on an 
Intel i7-2600 CPU, so in time equivalent to approximately 2 29 MD5 compres- 
sions [25] . As this shows that differential path construction can be done very fast 
and does not have to cost a significant fraction of the overall attack complexity 
and lacking more example collisions for analysis, our paper will focus on the 
complexity-wise more costly parts of the attack. 

4.1.3 Tunnel Strengths in the Near-Collision Blocks 

In order to estimate the complexity of the Flame collision attack, it is important 
to know whether and to what extent the attackers used tunnels. The tunnels Zl, 
75 and 7g are the most important in speeding up the attack. See Table 1 for a 
description of the three relevant tunnels. 

Observation 5 ([25, Sect. 3.3]). The table below lists per near- collision block the 
observed strength of tunnel 7%, the maximal strength possible given the respective 
differential path and the average strength that would have been observed if the 
tunnel was not used. 


Near-collision Block 

Observed strength 

Maximal strength 

Average strength 

1 

7 

17 

4.25 

2 

13 

18 

4.5 

3 

10 

17 

4.25 

4 

9 

18 

4.5 


It is clear that the tunnel 7g has been used, since the observed tunnel 
strengths are much larger than one expects to see if % was not used. Although 
not presented here, we’d like to note that the tunnel strengths for Zl and % are 
smaller than average, but one cannot conclude that T 4 and were not used 
since for each bit only one of and % can be active due to conflicting 

preconditions. 

For our complexity estimates, we will assume the strengths of these three 
tunnels to be the average over all four blocks. That is, we assume that tunnel Z 4 
has strength 3, % has strength 7.5 and % has strength 9.75. Reconstructing the 
exact tunnel-exploitation method would be interesting and could lead to more 
precise complexity estimates. We discuss some possible methods in Sect. 4.2.3. 

4.2 Our Reconstruction of the Chosen-Prefix Collision Attack 

In this section, we describe our reconstruction of the collision attack, in particular 
the differential path construction, the families of differential paths endings that 
were used, the cost of the Birthday Search and of the message block construction. 
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Central to our reconstruction attempt is the idea that the first two blocks 
eliminate Sc from SIHV = (5a, 5b , 5c, 5d) up to a constant term while allowing 
random changes in parts of 5b. The second two blocks then eliminate 5b and the 
constant term in 5c. This allows for the first two blocks to be constructed faster 
than estimated in the preliminary analysis in [25]. 

It seems that the four near-collision attacks can be grouped into two pairs: 
Blocks 1 and 2 form a pair, and, likewise, Blocks 3 and 4. In each of the pairs, the 
first block uses the message block differences of the first near-collision block in 
the identical-prefix attack by Wang et al. and the second block uses the difference 
of the second near-collision block in that attack. That is, in Blocks 1 and 3, the 
only differences in the message block are 5m 4 = 5m 14 = 2 31 and 5mu = 2 15 . In 
Blocks 2 and 4, the differences are negated, i.e., 5mu = — 2 15 . 

To determine the complexity of the Birthday Search and of the message block 
construction algorithm, we describe a family of end-segments of the differential 
path for each of the four near-collision blocks. We compute the complexity of 
the Birthday Search and the complexity of the algorithm for generating near- 
collision blocks on the basis of our reconstruction of the end-segments. 


Table 2. Chaining value difference corrections (5IHV 0 u t — 5IHV\ in) for each block 



Block 1 

Block 2 

5a 

[31] 

[31,5] 

Sd 

[31,25] 

[31,-25, -9,5] 

5c 

[31,25,-14,-12,9] 

[31,26,24,20, -9,5] 

5b 

[31,25,-18,-15,-12,9,1] 

[-26,24,21,-14, -9,5,0] 


Block 3 

Block 4 

5a 

[31] 

[31] 

5d 

[31,25,9] 

[31, -25, -9] 

5c 

[31,26,-24,-14,9] 

[31,-25, 14,-9] 

5b 

[30, 26, -24, 20, -17, 15, 9, -3] 

[-25, 14, -9, -5, 3,0] 


4.2.1 Differential Path Family 

Four near-collision blocks are used to eliminate the chaining value differences 
after the birthday search of the chosen-prefix collision attack. In this section we 
reconstruct the family of differential paths used for each of the four near-collision 
blocks based on the observed chaining value differences, the observed differential 
paths and the possible variations thereof that are compatible with the overall 
attack. 

In particular, each of the four near-collision attacks uses a carry expansion 
of a particular bit difference in the last few steps of Wang’s original differen- 
tial paths for MD5 to allow for some controlled additional differences to affect 
the chaining value differences. This can be seen in the recovered paths shown 
in Appendix A: for each block there is a primary carry chain either in 5Qq 2 
or 5Qqs starting at bit position either 5 or 25 used for controlled differences, 
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other small carry chains are random artifacts and not actively used. Our recon- 
struction is based on these primary carry chains and we will parametrize the 
amount of allowed carries. Using other carry chains significantly complicates the 
overall attack strategy, does not lead to significant benefits and does not fit the 
observed paths, hence we apply Ockham’s razor principle and keep to the most 
straightforward explanation. 

The differences that are added to 5IHV = (5a, 5b, 5c, 5d) using each near- 
collision block are summarized in Table 2. We begin with an outline of what we 
assume to be the elimination strategy. The differences in 5c are eliminated by 
the first two blocks using carry chains in 5Qq 2 starting at bit positions 25 and 5 
respectively, but a difference of — 2 24 is introduced which is then eliminated in 
the final two blocks. For 5b, matters are more complicated. Given the following 
observations: 

- deliberate changes to 5b possible in blocks 1,2 can be deferred to blocks 3,4; 

- random changes to 5b possible in blocks 1,2 can be handled in blocks 3,4; 

- blocks 1 and 2 actually increase the NAF-weight of 5b; 

we found that the best explanation is that the changes to 5b in the first two 
blocks are mostly random and that the elimination of differences in 5b is done 
in Block 3 and Block 4 using carry chains in 5Qqs starting at bit positions 25 
and 5 respectively. This explanation in fact reduces the complexity for Blocks 1 
and 2 as they only need to control 5Qq^ (that affects 5b) to a very small extent. 


Table 3. End segment of block 1. 


Steps 

Bitconditions 

60 

+BBBB1B 

61 

+BBBB1B 

62 

X+ 

63 

X + DDDDD+D 

64 

*** . . . . .***** ***AAA+A . . . . **** 


5Q 63 = 2 31 + 2 25 + 2 9 + C 14 2 14 + Ci 2\ 1 < < 5 

5Q 64 = <5Q 6 3+E?i 29 X i 2i +Ei= 14 X i 2 *+E; 1 0 *,2*, -1 < v x < 3 


We have generalized the observed differential path endings to a reasonable 
extent, i.e., making our reconstructed path families more general would make 
matters significantly more complex, while similar benefits might also be obtained 
by simply choosing larger parameters for our families below. The four differential 
path families are described as follows. For Block i we use a parameter Wi that 
specifies the length of the carry chain and thus over how many bits one can fully 
control the differences. Block 4 uses an additional carry chain whose length is 
determined by U 4 . For Blocks 1 and 2 we use an additional parameter v\ and V 2 
that control an amount of bit positions in which random differences are allowed 
as they can be handled in Blocks 3 and 4, these parameters v\ and v 2 depend 
on the value of u±. 
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Table 4. End segment of block 2. 


Steps 

Bitconditions 

60 

+ 

BBBBBB1B 

BBB 

61 

- 0000 

BBBBBB1B 

BB+ 

62 

+ - 

-+++++++ 






63 

XDDDD-D+ DDDD 

-B 

B+- 

64 

**ddd-a+ aaad**** 

*** . . . - . 

. . +***** 


5Q 63 = 2 31 - 2 26 + 2 24 - 2 9 + 2 5 + 'Ettto 2 02% 
SQ 64 = SQ 63 + ES 27 B i 2 ' + b 2o2 20 + Elio x i 2 ' + 

YZ 13 Xi 2i + T,t 30 Xi 2i 


0 < W2 < 6, -1 < V2 < 4 

1. Block 1 uses a carry chain starting in SQq 2 at bit position 25 up to 25 + w\ 
to control differences in SQq 3 over bit positions 8 up to 8 + w\. Given the 
differences that can be covered in Blocks 3 and 4, we can allow arbitrary 
differences in SQq 4 at bit position ranges [ 0 ,ui], [14,20], and [29,31]. 

2 . Block 2 uses a carry chain starting in SQq 2 at bit position 5 up to bit position 
9 + W 2 to control differences in SQq 3 over bit positions 20 up to 24 + u; 2 . Given 
the differences that can be covered in Blocks 3 and 4, we can allow arbitrary 
differences in SQq 4 at bit position ranges [ 0 ,^ 2 ], [13, 19], and [30,31]. 

3. Block 3 uses a carry chain starting in SQq 3 at bit position 24 up to 26 + w 3 
to control differences in SQq 4 over bit positions 13 up to 15 + W 3 . 

4. Block 4 uses a carry chain starting in SQqs at bit position 14 up to bit position 
14 + W 4 to control differences in SQq^ over bit positions 13 up to 15 H- ^ 3 , 
and a second carry chain at bit positions 9 up to 9 + ^4 to control differences 
over bit positions 30 up to (30 + u 4 mod 32) that wrap around to the lower 
bit positions. 

Note that in Block 4, the parameter U 4 must be large enough to eliminate the 
random changes to Sb that are made in Blocks 1 and 2 . That is, if max(ui, V 2 ) < 2 , 
we need U 4 > max(ui, V 2 ) + 2 and otherwise, we need U 4 = 4. Also, in Sect. 4.3.2 
we will estimate the complexity of solving these differential paths. 


Table 5. End segment of block 3. 


Steps 

Bitconditions 

61 

+BBBBBBB 1 0 

62 

+BBBBB+B 0 + 

63 

X+ B- - + 

64 

. + ... + .- .... DDDD D-D . . . + -... 


SQ 64 = 2 30 + 2 26 - 2 24 + 2 9 - 2 3 + E!=lT 3 Bi2\0< 


W 3 < 4 

We now describe each differential path family more fully in Tables 3, 4, 5 and 6 
by giving a template and specifying equations that the values of SQqi , . . . , SQq 4 
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Table 6. End segment of block 4. 


Steps 

Bitconditions 

61 

+ . . . 

. . 1 . ... BBBBB 

11BBBBB 

62 


...BBBBB 

00BBBB- 

63 

X. . . 






64 

DD. . 


.+ -D DD-D+DDD 


SQ 6 4 = — 2 25 + 2 14 - 2 9 - 2 5 + 2 3 + ES3o“ 4,1) ^ + 

ESo' 2 + E£r 


1 < 1^4 < 6, 0 < U 4 < 4 

must satisfy. In the templates, a symbol q t [i] at step (row) i and bit position 
(column) i can be any of the following: 

‘ . ’: represents Qt[i] = Q[ [i] ; 

- ‘+’: represents Qt[i\ 10, Q[[i] = 1; 

- represents Q*[i] % 1, Q'Ji] = 0; 

- ‘O’: represents Qt[i] = Qt[i\ = 0; 

- T’: represents Q t [i] = QJ[i] = 1; 

- £ ~’: represents Q t [i\ = Q' t [i\ = Qt-i[i\; 

- represents (Q t [i] = Q' t [i]) A (Q t [i] = 1 V Qt_ 2 [i] = 0); 

- C D’: a variable differential bitcondition, i.e., q t [i\ G 

- C B’: a variable Boolean function bitcondition, i.e., q t [i\ G { . , 0, 1, ?}; 

- ‘X’: a non-constant bitcondition, i.e., q t [i\ G {+,-}; 

- a (for now) irrelevant differential bitcondition; 

- ‘A’: the same differential as above, q t [i\ = q t -i [i]. 

The equations may contain the following terms: 

- Wi, Vi, uf. Parameters of the differential path family. Higher values for the Wi 
mean that the differential path family can cancel more differences but is, on 
average, harder to solve. 

- Ci , Bf. These terms can take on values in { — 1,0,1} and correspond to the 
variable differential bitconditions (‘D’s) in step 63 or 64, respectively. A mem- 
ber of the differential path family is determined by the Ci and Bi. 

- Xi\ These terms take on values in { — 1, 0, 1} and correspond to the irrelevant 
bitconditions (‘*’s). While the Bi and Ci fix a differential path in the family, 
the Xi are determined only after a successful near-collision block search. 

We say that a pair of inputs ( IHV,B ), ( IHV',B ') to MD5Compress solves the 
last four steps of the differential path if there is some setting for the Xi such 
that . . . , £Q 64 satisfy the given equations. This is a more lax definition 

than what we use elsewhere, i.e., we do not require a solution to solve the exact 
bitconditions but use bitconditions as a tool to show which SQi are possible. 



Reverse-Engineering of the Cryptanalytic Attack 601 


4.2.2 Birthday Search 

We calculate the Birthday Search complexity for the maximal parameter values. 
It is easy to compute the Birthday Search complexity for lower values, namely 
for each carry that is dropped, the complexity increases by a factor of 2 0,5 . 

Given the elimination strategy, we can now specify the Birthday Search tar- 
get. We require that there are fixed differences in Sa and Sd and that for those 
bit positions i that we can not manipulate in our four near-collision blocks, we 
need c[i\ = c'[i\ or b[i\ = b'[i\ (after subtracting the constant bit differences). 
Given these constraints, the Birthday Search looks for a collision of the function 


f(x) — (a, 610, • • • , 613, 621 J • • • > ^26, Co, ... , C7, C 15, . . . , C19, C31, d) 


where (a, 6, c, d) = 


MD5Compress(/i^W, B\\x) + (— 2 5 , 0, — 2 5 , 2 9 — 2 5 ) 
MD5Compress {IHV', B'\\x) 


x even 
x odd 


and b = b — c 


with IHV and IHV' the intermediate hash values after processing the two chosen 
prefixes. Not every collision of / is useful. The probability p that a collision is 
useful is at most 1/2 since we require that the two parts use different prefixes. 
Therefore the expected number of compression function calls required to find a 
useful collision is y/ir • 2 88 / (2 • p) ~ y/ir • 2 44 ~ 2 44,8 [18]. 

As we already mentioned, we use parameters to make trade-offs between 
message block construction and Birthday Search cost. For every carry we do not 
rely on, we introduce another bit position where b and b' or c and c! may not 
differ, increasing the Birthday Search complexity by a factor of 2 0,5 . This allows 
us to trade off Birthday Search complexity against complexity in the message 
block construction. 


4.2.3 Tunnel Exploitation Analysis 

As explained in Sect. 4.1.3, the tunnel strength in the Flame differential paths 
was neither average nor maximized. We now derive a formula for the expected 
tunnel strength when each tunnel bit is active with probability a. Let m be the 
number of bits that could be active for 7g. For a random solution up to step 24, 
let S be the random variable measuring the strength of 7g and solve the event 
that the partial solution can be extended to a full solution using 7g. Assuming 
Pr [solve | S = k\ « 2 k • p for some p independent of fc, we can calculate 


E[S | solve] 


E 

k 


* ■ c) 




k—i 


An explanation for the observed tunnel strengths (Observations) proposed 
in [25] is that the Flame authors did not try to maximize the tunnel strength 
but used tunnels in their message block construction algorithm to the extent 
that they were available. This corresponds to setting a = 1/4. On the other 
hand, we consider the alternative hypothesis that many bits in working state 
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Q 10 were fixed to ‘O’ to bring the probability closer to a = 1 / 2 . In Table 7 , 
we list the expectation and variance of the tunnel strength for both values of 
a. These results show that the initial explanation by Stevens with a = 1/4 is 
rather unlikely, while the explanation with = 1/2 is more probable. 


Table 7. Summary of the observed (s), maximal ( m ) and expected (/x) tunnel strength, 
and the standard deviation (cr). 





P 

II 

a = 1/2 

Block 

s 

m 


a 


a 

1 

7 

17 

6.80 

2.02 

8.67 

1.70 

2 

13 

18 

7.20 

2.08 

10.00 

1.83 

3 

10 

17 

6.80 

2.02 

9.33 

1.76 

4 

9 

18 

7.20 

2.08 

10.67 

1.89 


4.3 Cost Estimation 

4.3.1 A Formula for the Expected Cost 

We now estimate the cost of generating a near-collision block. Since the bitcon- 
ditions are concentrated on the first 16 working states and the tunnel % is used, 
we assume that the algorithm can be broken down into the following steps: 

1. Generate a full differential path/generate a set of initial working states that 
connects to the lower differential path. 

2. Select Q i, . . . , Q iq according to the path and tunnel requirements. 

3 . Try to obtain a solution 6 up to step 24 with the help of tunnels and T$. Go 
back to step 2 and choose different Qi if it is not possible to obtain a solution 
and use early abort to reduce the cost of this step. 

4 . Attempt to generate a solution for the whole path from our solution up to 
step 24 using tunnel %. We use early abort to some extent. 

5 . Check if the values for SQqi, . . . , SQq 4 are correct. If yes, we have a solution. 

The expected cost of this algorithm is as follows: C pa th is the differential path 
construction cost; let the random variable Z be the number of input pairs with 
SQ^ = • • • = SQq 0 = 2 31 that we need to evaluate until we find an input pair 
where SQq 1, . . . , SQq^ are as specified by the differential path. The expected cost 
of finding a solution with the correct values for SQqi , . . . , SQq 4 is then 

Cblock = Cpath + E[Z\ • 2 13,6 . 

The factor 2 13,6 represents the measured average complexity of finding input 
pairs with SQ 57 = • • • = SQqq = 2 31 for Flame’s differential paths. This com- 
plexity is very stable for all near-collision blocks as the differential paths are 


We say that a pair of inputs solves a path up to step t if it agrees with the bitcon- 
ditions q_ 3 , . . . , q t and with the SQ t + 1 from the differential path. 
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only varied in the first 16 steps which don’t affect complexity and the last few 
steps which are instead covered by Z. Hence, the expected complexity of finding 
a near-collision block is E [Z] • 2 13,6 . 

We give estimates for E [Z\ in the next section. As discussed in Sect. 4.1.2, 
Cpath can be as low as 2 29 MD5 compressions, which will be negligible compared 
to the other parts of the attack. 


4.3.2 Estimating the Expected Number of Attempts 

In this section, we want to estimate the expected number of input pairs with 
SQ 57 = • • • = SQq o = 2 31 we have to generate until a solution for the differential 
path is found. We call input pairs with SQ$j = • • • = SQqo = 2 31 admissible 
input pairs and we call the values for SQqi , . . . , SQq 4 that we want the target. 

Let %, Ui , Vi , Wi be the random variable that gives the target for Block i with 
parameters iq , Vi and Wi. Selecting a target is done by selecting the values 
for Bk,Ck £ { — 1,0,1} as in Sect. 4.2.1. Let Z T be the random variable that 
counts the admissible inputs we have to try until r is solved and let Zi, Ui , Vi:W . 
be the random variable obtained by first sampling r <— %, Ui ,v u Wi and then 
sampling Z T . To compute the total expected cost, we need E [Zi jUuVilWi \- To 
obtain an empirical estimate A i, Ui ,vi,wi, we repeat the following process until 
a fixed number of targets is solved: We first sample r %^ UuVuWi and then 
select random admissible inputs and message blocks until we find one that solves 
the target. 7 When the chosen number of targets is solved, we let the average 
number of attempts to solve a target be our estimate A i jUijVijWi . We then obtain 
Ci,ui,vi,wi = A i, Ui ,vi,Wi • 2 13,6 as an estimate for the cost of solving the differential 
path for Block i with parameters iq , Vi and W{. The simulation outcomes for the 
four blocks are given in Tables 8, 9, 10 and 11. 

To save time, we do not generate admissible inputs as in Sect. 4.3.1. Instead, 
we select working states Q 57 , . . . , Q60 and message words mo, . . . , mis at random 
and compute Q'q 7i . . . , Q'qq and m^, . . . , m' 15 by applying the appropriate arith- 
metic differentials. This procedure requires the assumption that the probability 
for hitting the target does not change when we select Q57 , . . . , Q60 and message 
words at random, which is justified by the pseudo-randomness of MD5. 

Our estimate of the Birthday Search cost in Sect. 4.2.2 assumes that the 
parameters wi and u 4 are maximal. For smaller parameter values, the cost must 
be multiplied by a “Birthday Factor” yiq which we give in Table 12. 


4.3.3 Total Cost. 

Let us now combine our estimates for the cost of solving the paths for different 
parameter setting with the Birthday Search complexity. We will calculate the 
following costs: 


7 Recall that we say that an input solves a differential path if there exists a setting 
for the Xk such that SQq 1, . . . , SQq± are as described by the path. 
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Table 8. Estimated complexities for the first near-collision block. 


1°§2 Cl,vi,wi 

Vi — —1 

vi = 0 

V\ — 1 

v\ — 2 

^i=3 

W\ — 1 

24.1 

23.7 

23.6 

23.4 

22.9 

W\ — 2 

25.8 

25.2 

25.0 

24.8 

24.3 

wi = 3 

27.9 

27.2 

26.8 

26.5 

26.1 

wi = 4 

29.8 

29.1 

28.6 

28.2 

27.8 

w\ — 5 

30.4 

29.7 

29.2 

28.7 

28.3 


Table 9. Estimated complexities for the second near-collision block. 


1°§2 ^ 2 ,v 2 ,w 2 

V 2 — — 1 

V 2 — 0 

V 2 — 1 

V2—2 

V 2 — 3 

V >2 — 4 

W 2 — 0 

35.4 

34.8 

34.7 

34.6 

34.6 

34.6 

W 2 — 1 

37.0 

36.2 

36.1 

36.0 

36.0 

36.0 

W 2 — 2 

39.2 

38.2 

37.9 

37.8 

37.8 

37.8 

W 2 — 3 

41.6 

40.5 

40.0 

39.8 

39.7 

39.7 

II 

(M 

5 

44.0 

42.9 

42.4 

42.0 

41.8 

41.8 

W 2 — 5 

46.7 

45.5 

45.0 

44.6 

44.2 

44.0 

W 2 = 6 

49.3 

48.1 

47.6 

47.2 

46.8 

46.4 


Table 10. Estimated complexities for the third near-collision block. 



^Og2 

— 0 

32.3 

^3 = 1 

34.3 

W 3 = 2 

36.4 

W 3 = 3 

38.1 

w 3 — 4 

38.3 


- C msg : expected cost when minimizing the message block construction cost. 

- Cflame : expected cost when minimizing the message block construction cost 
while keeping the parameters consistent with the observed paths. 

- C S earch- expected cost when minimizing the Birthday Search cost. 

- C m i n : minimal expected cost. 

Firstly, for C msg , we choose wi, . . . to be as small as possible. We have to 
balance the parameters v\ and against u±. Increasing v\ and v 2 does not speed 
up the message block construction by much, so we pick V\,V 2 = —1 which allows 
us to pick U 4 Q 1. The combined Birthday Factor for these parameters is 2 110 . 
We therefore have 

C msg = 4 • C path + 2 110 • 2 44 ' 3 • p- 1 / 2 + 2 24 ' 1 + 2 35 ' 4 + 2 32 ' 3 + 2 34 ' 1 » 2 55 ' 8 
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Table 11. Estimated complexities for the fourth near-collision block. 


C 4,U4,W4 

o 

II 

3 

U 4 — 1 

£ 

II 

to 

£ 

II 

GO 

II 

3 

W 4 — 1 

34.1 

33.8 

35.6 

38.2 

38.7 

£ 

II 

to 

35.2 

35.0 

36.7 

39.4 

39.8 

£ 

II 

GO 

37.0 

36.5 

38.4 

41.0 

41.4 

II 

5 

38.8 

38.4 

40.2 

42.7 

43.8 

LO 

II 

5 

40.8 

40.6 

42.3 

44.6 

44.8 

CO 

II 

9 

43.0 

42.4 

43.6 

46.9 

47.8 


Table 12. “Birthday Factors” for the four near-collision blocks. 


Block i 

1 

2 

3 

4 

log 2 Pi 

(5 - wi)/2 

(6 - w 2 )/2 

(4 - w 3 )/2 

(10 — W 4 — U 4)/2 


where C pa th is the cost of constructing a full differential path and p is the prob- 
ability that a collision is useful. We assume that p ~ 1/2 and use the fact that 
4 • Cpath can be negligible compared to the other costs (see Sect. 4.1.2). 

For Cfl ame , we must choose minimal values for the Wi that are compatible 
with the differential paths. That is, we must take wi = 4, = 3, = 4 and 

W 4 = 1. We have v\ > 1 and v 2 > 0 , therefore, we must have U 4 > 3. We can 
minimize the cost by choosing v\ = = 1x4 = 4. Then, we have a Birthday 

Factor of 2 4,5 . With the same assumptions as before, this gives us 

F flame = 4 • C pa th + 2 • p ' + 2 +2 +2 +2 ^2 

For Cgearch, we have a Birthday Factor of 1 and 

Csearch = 4 • C pat h + 2 44 ' 3 • p~ 1/2 + 2 28 ' 3 + 2 46 ' 4 + 2 38 ' 3 + 2 47 ' 8 « 2 48 ' 4 

To minimize the total expected cost, we take w\ = 5, v\ = 3, W 2 = 5, V 2 = 4, 
= 4, 1^4 = 5 and 1x4 = 4. Then, we have a Birthday Factor of 2 10 and 

C mi „ = 4 ■ Cpath + 2 45 ' 8 • p~ 1/2 + 2 28 ' 3 + 2 44 ' 0 + 2 38 ' 3 + 2 44 ' 8 « 2 46 ' 6 

We now show that this cost is indeed minimal: 

Theorem 6 . Given the values for E[Z\ from Sect. 4-3.2 and assuming that the 
probability p for a useful collision in the Birthday Search is 1/2, the expected 
cost of the collision attack is equivalent to at least C m i n = 2 46,6 executions of 
MD5Compress. For suitably chosen parameters, this cost can be achieved. 

Proof. We have already given parameters which show that the second part of 
the theorem holds. To see that this parameter choice indeed gives us the minimal 
cost, let us try to improve upon it: It is easy to see that the Birthday Factor p 
must satisfy 1 < p < 2 1-5 for if p = 1, the attack complexity is C sear ch > C m i n and 
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if n = 2 2 0 , the Birthday Search cost is already larger than C m i n . If n = 2 0,5 ' /c , we 
can reduce the Wi or iq parameters by k. Since Blocks 2 and 4 have the highest 
complexity, this is where these reductions should be spent. 

For /i = 2 0,5 , in order to improve upon C m i n , we need to construct the near- 
collision blocks with a cost < 2 45 8 , for (i = 2 10 , the cost needs to be < 2 45,4 
and for /i = 2 1,5 it needs to be < 2 44,2 . It turns out that the only way these 
constraints can be satisfied is by setting p, = 2 10 and reducing the parameters 
w 2 and uq. But these are precisely the parameters that give us C m | n . □ 

The parameters for C m i n are consistent with the observed differential paths. 
Assuming that our reconstruction is correct, we can conclude that the expected 
cost of the collision attack used by the Flame authors is lower-bounded by 2 46,6 
calls to MD5Compress. However, it seems likely that the cost of the actual attack 
was higher than C min since the observed number of carries is always lower than 
the Cmin-parameters. Nevertheless, the actual collision attack might have been 
faster in practice: Since Birthday Search can be executed very cost-effectively on 
massively parallel architecture (e.g., GPUs), it might be advantageous to shift a 
larger part of the workload to the Birthday Search step. 

The expected cost of the [2 7] -attack with four near-collision blocks is roughly 
1/4 of the lower bound of the Flame attack; its expected cost is equivalent to 
2 44,55 calls to MD5Compress (see [25, Sect. 3.7]). The cost of the Birthday Search 
dominates the total cost. 

5 Conclusion 

In this paper we have demonstrated for the first time that a cryptanalytic attack 
can be reconstructed from a single output example, specifically, a single example 
half of a collision pair. We have provided a complexity analysis proving a lower- 
bound for its cost. Furthermore, we showed that in terms of theoretical cost, 
the Flame attack is less efficient than the [27] -attack, although it might achieve 
a better real-world performance when the Birthday Search is performed on a 
massively parallel architecture. 

Our reverse-engineering of a yet unknown cryptanalytic attack seems to be 
without precedent. As allegedly Flame was developed by some nation-state(s), 
the example collision and its analysis in this work provide some insights to their 
cryptanalytic knowledge and capabilities. With respect to the complexity, the 
closest fit of attack parameters is equivalent to 2 49,3 MD5 compressions which 
takes roughly 40,000 CPUcore hours. That means for say 3-day attempts to 
succeed in reasonable time given the large number of required attempts, one 
needs about 560 CPUcores, which is large but not unreasonable even for acad- 
emic research groups. With the respect to cryptanalytic knowledge there are no 
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indications at all of superior techniques, rather various parts seem to be sub- 
optimal compared to the state-of-the-art in the literature. In particular it is clear 
that one could do better using the state-of-the-art in the literature, i.e. , lower 
theoretical complexity to craft a 4-block chosen-prefix collision (see Theorem 6) , 
and generate differential paths with significantly lower density of bitconditions in 
negligible time (as previously observed [25]). Nevertheless, the apparent signifi- 
cant resources more than make up for that and it seems a working attack that 
succeeds in reasonable time was more important than optimizing the overall 
attack using all state of the art techniques. 

A Flame Differential Paths 

Here, we show the differential paths for all four Flame near-collision blocks, 
see also Sect. 4.2.1. The column ‘Probability’ lists the theoretical unconditional 
rotation probabilities from ST t to SR t . If this rotation probability for this ST t 
is not maximal, we list the maximal possible rotation probability for this 5T t 
albeit for a different 5R t between braces. In the next column ‘Cond. Est.’, we 
give empirical estimates for the probabilities of the rotations conditioned on that 
the Qi satisfy their bitconditions (Tables 13, 14, 15 and 16). 


Table 13. Differential path sections of the 1st near-collision block of Flame’s attack 


t 

Bitconditions q* [31] . . . q*[0] 

Probability 

Cond. est. 

2 

+0-0-.00 .-++00+- 0-1-+.1+ l+-0++~. 

0.247 (0.628) 

0.166 

3 

+010-000 .-+++0+1 + — . +~1+ -+-+++-. 

0.911 

1 

4 

-00-10+. . 11-+-0+ +++11— 0 -101-+0. 

0.381 (0.561) 

1 

5 

0-+-++-- ''0110+1- -110+0-0 -0001+1'' 

0.229 (0.435) 

1 

6 

++ +- + +++ ++++++++ 

0.425 (0.514) 

1 

7 

111. -Ill 1101011. 110-1001 +0100.00 

0.838 

1 

8 

00+0.111 10111101 -1101100 .1110011 

0.063 (0.444) 

0.171 

9 

. .0.1 -. . 0.10+. . . 0- 0. 

0.516 

0.563 

59 

+ 

1 

1 

60 

+ .11110 

1 

1 

61 

+ .11000 001.00 

0.992 

1 

62 

0 

1 

1 

1 

1 

+ 

1 

0.391 (0.609) 

0.427 

63 

64 

+ .?o??+ — +.+- 

+ + ++++++ . . +- . 

0.867 

0.855 
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Table 14. Differential path sections of the 2nd near-collision block of Flame’s attack 


t 

Bitconditions q t [31] . . . q*[0] 

Probability 

Cond. est. 

2 

.01.-011 00+-++0+ 0— +.— 0 ++10+0+0 

0.849 

0.492 

3 

..1.-+11 +001++~+ 01-+0110 0+1++0++ 

0.623 

0.833 

4 

..-.1-11 ++1-++-+ -1111— + ++0+-+-1 

0.100 (0.547) 

1 

5 

— 1~+1— 10-01011 0+10-1-+ 0-+++000 

0.399 (0.431) 

0.499 

6 

+-++++++ ++++ +- — + 

0.458 (0.518) 

1 

7 

0010-000 01111011 1011-111 10.10010 

0.961 

1 

8 

00000100 1111111+ -1001111 1-010111 

0.468 

0.673 

9 

. . .-1. . . .- 1 0. .1+. . . .1 

0.468 (0.469) 

0.495 

58 

+ 

1 

1 

59 

+ 0 

1 

1 

60 

+ o 1001. 110 

0.5 

0.507 

61 

- 100 . . .0 1. .1. 00+ 

0.496 

0.749 

62 

+ . . . . 1- -+++. + — 

0.972 

0.948 

63 

+ . . . .++- . . . + ???-. ?+- 

0.238 (0.270) 

0.262 

64 

+ 

1 

+ 

1 

1 

+ 

1 

1 








Table 15. Differential path sections of the 3rd near-collision block of Flame’s attack 


t 

Bitconditions q*[31] . . . q*[0] 

Probability 

Cond. est. 

2 

10-01110 +++1— + +10+ 0-0++++1 

0.404(0.408) 

0.374 

3 

-0-01~l+ +0+1—10 0-++~~.0 01+0+00. 

0.941 

1 

4 

— 0++-00 0-0+11++ ++-1-+10 -+00+-1. 

0.085(0.593) 

1 

5 

-1++-0-1 +1-00+1- +0++110- -1— 1+~~ 

0.776 

1 

6 

++ +- + +++ ++++++++ 

0.514 

1 

7 

1000-010 00.1010. 101-0101 +0001.00 

0.838 

1 

8 

11+1.101 01011100 -1000101 .1000011 

0.437 

0.0566 

9 

. .0.1 -. . 0.10+. . . 0- 0. 

0.516 

0.573 

58 

+ 

1 

1 

59 

+ 

1 

1 

60 

- o 1 

1 

1 

61 

-.0110.0 1 0 

0.496 

0.515 

62 

+ . .01. + 0 + 

0.498 

0.492 

63 

+ 

+ 

1 

1 

1 

•■0 

1 

1 

+ 

0.404 

0.396 

64 

. + ... + .- ... .++++ - + -... 
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Table 16. Differential path sections of the 4th near-collision block of Flame’s attack 


t 

Bitconditions q*[31] . . . q*[0] 

Probability 

Cond. est. 

2 

+— .-0-. -+1+0— 0 1+1-1-++ -1-00+— 

0.691 

0.757 

3 

+— 1-~1. .+100— + 10—1+0 0++-1 

0.309 

1 

4 

-010+-1. 10-1-01+ 0-000-1- 0+-10-1- 

0.574 

1 

5 

+00-+00~ 0++-11-0 +++0-111 01-+-100 

0.749 

1 

6 

+-++++++ ++++ +- — + 

0.518 

0.507 

7 

.111-110 01.010.0 0101-110 1101.011 

0.961 

0.735 

8 

11110110 0101000+ -0101111 0-100111 

0.032(0.476) 

0.0508 

9 

. . .-1. . . .- 1 0. .1+. . . .1 

0.468(0.469) 

0.522 

58 

_ 

1 

1 

59 


1 

1 

60 

+ o 00 

1 

1 

61 

+ 1 11 1 

0.496 

0.525 

62 

- - 10. . .-+ 

0.5 

0.493 

63 

+ - +-...?- 

0.500 

0.503 

64 

.... -++ +-....-. ..-. + ..+ 




B Message Blocks and IHV s 



Flame certificate 

Legitimate certificate 

ihv 7 

a262d0136907c960bb84d9d73b74732e 

8262d01365179fa09bd4c9cflb76732e 

B 8 

7f 7b4b7bc6beeb3f 9f 983da38487547 e 

7f 7b4b7bc6beeb3f 9f 983da38487547 e 


72877 1254b6835ae65bd6c8fdc8dacc4 

72877 Ia54b6835ae65bd6c8fdc8dacc4 


e89892dedc5362f 5726a2527a31246eb 

e89892dedc5362f 5726a2527a39246eb 


7f 6d58cd3083d77a85b848e60e0 1 1168 

7f 6d58cd3083d77a85b848660e011168 

IHV 3 

63f c3d453bdacbc8826f aa39cc7df 2cc 

43f c3dc5395c9d8a62719ab3ac7f f 24e 

b 9 

657d53380b40f43b684359cl3c05c340 

657d53380b40f 43b684359cl3c05c340 


269d5197e2eb2eb8c2196e4e94463bd8 

269d5117e2eb2eb8c2196e4e94463bd8 


d4f d0d00dl68f adf f 3f al88a7c659bda 

d4f d0d00dl68f adf f 3f al88a7ce59ada 


23119f 16a68b23248887226919c211ea 

231 19fl6a68b2324888722e919c2 Ilea 

ihv 9 

7aeea241ddd49e30b9ce4dab4b8e0f f 4 

7aeea241f Cl490efb9ce4daa4b8e0ff4 

Bio 

9d3681adfbe88bd2d0eb06f21a868dc6 

9d3681adfbe88bd2d0eb06f21a868dc6 


84f 388c5e0d964c64895d4bed3544891 

84f 38845e0d964c64895d4bed3544891 


e66ce91e33971542eeb46dlf 150b27dd 

e66ce91e3397 1542eeb46dlf 158b27dd 


08bb81deb6961639d926446a5fdl6b3f 

08bb81deb6961639d92644ea5fdl6b3f 

IHVio 

ac3aa31bd79e7f 3a9b34ec0a850e3940 

ac3aa39bee607f 3c9bf 6eb8c851039c2 

B n 

1271dcf 09962d2431458f 86ef 82235d2 

127 ldcf 09962d2431458f 86ef 82235d2 


90f7fd936ac449b8cb0ce965a8f722b5 

90f7fdl36ac449b8cb0ce965a8f722b5 


f 2051920ef 2563c7b3974a823eb2e3ee 

f 2051920ef 2563c7b3974a823e32e3ee 


b45ecbldb3598f 8df47901blb6688914 

b45ecbldb3598f 8df4790131b6688914 
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Abstract. In 2012, NIST standardized SHA-512/224 and SHA-512/256, 
two truncated variants of SHA-512, in FIPS 180-4. These two hash func- 
tions are faster than SHA-224 and SHA-256 on 64-bit platforms, while 
maintaining the same hash size and claimed security level. So far, no 
third-party analysis of SHA-512/224 or SHA-512/256 has been published. 
In this work, we examine the collision resistance of step-reduced ver- 
sions of SHA-512/224 and SHA-512/256 by using differential cryptanaly- 
sis in combination with sophisticated search tools. We are able to generate 
practical examples of free-start collisions for 44-step SHA-512/224 and 
43-step SHA-512/256. Thus, the truncation performed by these variants 
on their larger state allows us to attack several more rounds compared to 
the untruncated family members. In addition, we improve upon the best 
published collisions for 24-step SHA-512 and present practical collisions 
for 27 steps of SHA-512/224, SHA-512/256, and SHA-512. 


Keywords: Hash functions • Cryptanalysis • Collisions • Free-start 
collisions • SHA-512/224 • SHA-512/256 • SHA-512 • SHA-2 

1 Introduction 

The SHA-2 family of hash functions is standardized by NIST as part of the 
Secure Hash Standard in FIPS 180-4 [21]. This standard is not superseded by 
the upcoming SHA-3 standard. Rather, the SHA-3 hash functions supplement 
the SHA-2 family. Thus, it is likely that the SHA-2 family will remain as ubiqui- 
tously deployed in the foreseeable future as it is now. Therefore, the continuous 
application of state-of-the-art cryptanalytic techniques for quantifying the secu- 
rity margin of hash functions of the SHA-2 family is of significant practical 
importance. 

In this work, we focus on the two most recent members of the SHA-2 family, 
SHA-512/224 and SHA-512/256. As already observed by Gueron et al. [10], using 
truncated SHA-512 variants like SHA-512/256 gives a significant performance 
advantage over SHA-256 on 64-bit platforms due to the doubled input block size. 
At the same time, the shorter 256-bit hash values are more economic, compati- 
ble with existing applications, and offer the same security level as SHA-256. In 
addition, the resulting chop-MD [5] structure of SHA-512/224 and SHA-512/256 
with is wide-pipe structure provides cryptographic benefits over the standard 
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Merkle-Damgard [7,20] structure by prohibiting generic attacks like Joux’ mul- 
ticollision attack [12], Kelsey and Kohno’s herding and Nostradamus attacks [13], 
and Kelsey and Schneier’s second preimages for long messages [14]. 

However, no cryptanalysis dedicated to SHA-512/224 and SHA-512/256 has 
been published so far. Therefore, we examine the effects of truncating the hash 
value of SHA-512. We show that due to this truncation, practical free-start colli- 
sion for 43-step SHA-512/256 and 44-step SHA-512/224 are possible. Moreover, 
we improve upon the previous best collisions for 24-step SHA-512 [11,23] and 
show collisions for 27 steps of SHA-512, SHA-512/224, and SHA-512/256. Since 
all of our results are practical, we provide examples of colliding message pairs 
for every attack. Our results are summarized in Table 1 together with previously 
published collision attacks. 

Table 1 . Best published collision attacks on the SHA-512 family. 


Hash size 

Type 

Steps 

Complexity 

Reference 

all 

collision 

24/80 

practical 

[11,23] 


collision 

27/80 

practical 

Sect. 4.3 


semi-free-start collision 

38/80 

practical 

[9] 


semi-free-start collision 

39/80 

practical 

Sect. 4.1 

512 

free-start collision 

57/80 

2255.5 

[17] 

384 

free-start collision 

40/80 

2 183 

[17] 

256 

free-start collision* 

43/80 

practical 

Sect. 4.2 

224 

free-start collision* 

44/80 

practical 

Sect. 4.2 


without padding. 


Related Work. No dedicated cryptanalysis of SHA-512/224 or SHA-512/256 
has been published so far. However, there is a number of results targeting 
SHA-512. The security of SHA-512 against preimage attacks was first studied by 
Aoki et al. [1]. They presented MITM preimage attacks on 46 steps of the hash 
function. This was later extended to 50 steps by Khovratovich et al. [15]. How- 
ever, due to the wide-pipe structure of SHA-512/224 and SHA-512/256, these 
attacks do not carry over to SHA-512/224 and SHA-512/256. 

The currently best known practical collision attack on the SHA-512 hash 
function is for 24 steps. It was published independently by Indesteege et al. [11] 
and by Sanadhya and Sarkar [23]. Both attacks are trivial extensions of the 
attack strategy of Nikolic and Biryukov [22] which applies to both SHA- 
256 and SHA-512. Recently, Eichlseder et al. [9] demonstrated how to extend 
these attacks to get semi-free- start collisions for SHA-512 reduced to 38 steps 
with practical complexity. Furthermore, second-order differential collisions for 
SHA-512 up to 48 steps with practical complexity have been shown by 
Yu et al. [27]. We want to note that all these practical collision attacks on 
SHA-512 are also applicable to its truncated variants. 
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Additionally, Li et al. showed in [IT] that particular preimage attacks on 
SHA-512 can also be used to construct free-start collision attacks for the step- 
reduced hash function and its truncated variants. They show a free-start collision 
for 57-step SHA-512 and 40-step SHA-384. Both attacks are only slightly faster 
than the respective generic attacks. 


Outline. The remainder of the paper is organized as follows. We describe the 
design of the SHA-2 family in Sect. 2. Then, we briefly explain our attack strategy 
and discuss the choice of suitable starting points for our attacks in Sect. 3. The 
actual attacks on step-reduced SHA-5 12/224 and SHA-5 12/256 are presented in 
Sect. 4. 

2 Description of SHA-512 and Other SHA-2 Variants 

The SHA-2 family of hash functions is specified by NIST as part of the 
Secure Hash Standard (SHS) [21]. The standard defines two main algorithms, 
SHA-256 and SHA-512, with truncated variants SHA-224 (based on SHA-256) 
and SHA-512/224, SHA-512/256, and SHA-384 (based on SHA-512). In addi- 
tion, NIST defines a general truncation procedure for arbitrary output lengths 
up to 512 bits. Below, we first describe SHA-512, followed by its truncated vari- 
ants SHA-512/224 and SHA-512/256 that this paper is focused on. Finally, the 
main differences to SHA-256 and SHA-224 are briefly discussed. 

SHA-512. SHA-512 is an iterated hash function that pads and processes the 
input message using t 1024-bit message blocks rrij. The 512-bit hash value is 
computed using the compression function /: 


h 0 = IV, 

hj+\ = f(hj,rrij ) for 0 < j < t. 


The hash output is the final 512-bit chaining value h t . 

In the following, we briefly describe the compression function / of SHA-512. 
It basically consists of two parts: the message expansion and the state update 
transformation. A more detailed description of SHA-512 is given by NIST [21]. 

We use + (or — ) to denote addition (or subtraction) modulo 2 64 ; 0 (or A) is 
bitwise exclusive-or (or bitwise and) of 64-bit words, and n (or >> n) denotes 
rotate-right (or shift-right) by n bits. 


Padding and Message Expansion. The message expansion of SHA-512 splits 
each 1024-bit message block into 16 64-bit words M^, i = 0, . . . , 15, and expands 
these into 80 expanded message words Wi as follows: 


Analysis of SHA-512/224 and SHA-512/256 


615 


Wi = 


Mi 

<ri{Wi - 2 ) + Wi - 7 + MWi- 15 ) + Wi _ 16 


0 < i < 16, 
16 < i < 80. 


( 1 ) 


The functions <tq(x) and cr\{x) are given by 

cto ( 2 r) = (a; 1) © (x 8) © (x > 7), 

cti(x) = (a; 19) © (x 61) © (a; » 6). 

State Update Transformation. We use the alternative description of the 
SHA-512 state update by Mendel et al. [18], which is illustrated in Fig. 1. 



Fig. 1 . The state update transformation of SHA-512. 


The state update transformation starts from the previous 512-bit chaining 
value hj = (A_i, . . . , A- 4 , E- 1 , . . . , E- 4 ) and updates it by applying the step 
functions 80 times. In each step i = 0, . . . , 79, one 64-bit expanded message word 
Wi is used to compute the two state variables Ei and Ai as follows: 

Ei = Ai - 4 A Ei - 4 A Ei(Ei-i) A IF(T^_i, Ei- 2 , Ei- 3 ) A Ki A Wi , (2) 

Ai = Ei — Ai - 4 + E^Ai-i) A MAJ (Ai— i, 2 , Ai— 3 ). (3) 

For the definition of the step constants FQ, we refer to the standard docu- 
ment [21]. The bitwise Boolean functions IF and MAJ used in each step are 
defined by 


IF(x, y, z) = (x A y) ® (x A z) ® z, 
MAJ(x, y, z) = (x Ay) 0 (y A z) ® (x A z), 
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and the linear functions £o and are defined as follows: 

£o(x) = 0 28) © (x 34) © (x 39), 

E\(x) = (x 14) 0 (x 18) 0 (x 41). 

After the last step of the state update transformation, the previous chaining 
value is added to the output of the state update (Davies-Meyer construction). 
The result of this feed-forward sum is the chaining value hj+i for the next 
message block (or the final hash value h t ): 

hj+i = (A79 + A_ 1, . . . , Ajq + A_ 4, £79 + £_ 1, . . . , Ejq 0 £_ 4). (4) 

SHA-512/256 and SHA-512/224. These truncated variants of SHA-512 dif- 
fer only in their initial values and a final truncation to 256 or 224 bits, respec- 
tively. The rest of the algorithmic description remains exactly the same. The mes- 
sage digest of SHA-512/256 is obtained by omitting the output words £ 7 g 0 £_i, 
£78 0 E— 2 , £77 0 E —3 , and £760 £-4 of the last compression function call. SHA- 
512/224 additionally omits the 32 least significant bits of A 76 0 A_ 4 . 

SHA-256 and SHA-224. SHA-256 and SHA-512 are closely related. Thus, we 
only point out properties of SHA-256 which differ from SHA-512: 

- The wordsize is 32 instead of 64 bits. 

- IV and Ki are the 32 most significant bits of the respective SHA-512 value. 

- The step function is applied 64 instead of 80 times. 

- The linear functions cro,cri,£ 0 and £1 use different rotation values. 

SHA-224 is a truncated variant of SHA-256 with different IV, in which the output 
word £60 0 £-4 is omitted. 

3 Attack Strategy 

Starting from the ground-breaking results of Wang et al. [25,26], the search tech- 
niques used for practical collisions have been significantly improved, hitting their 
current peak in the attacks on SHA-256 [2,19] and SHA-512 [9,27]. In spite of 
all achieved improvements, the top-level attack strategy has remained essentially 
the same. At first, a suitable starting point for the search must be determined to 
define the search space and hopefully make the ensuing search process feasible. 
The search itself usually involves two phases: The search for a suitable differen- 
tial characteristic, and the message modification phase to determine a collision- 
producing message pair for this characteristic. The search for this characteristic 
and message pair can either be done by hand or, for more complex functions like 
SHA-2, using an automatic search tool. We use a heuristic search tool based on a 
guess-and-determine strategy, which we briefly describe in Sect. 3.1. Afterwards, 
we discuss the choice of suitable starting points in Sect. 3.2. 
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3.1 Guess-and-Determine Search Tool 

To search for differential characteristics and colliding message pairs, we use an 
automatic search tool, which implements a configurable heuristic guess-and- 
determine search strategy. Roughly, the tool is partitioned into two separate, 
but closely interacting parts: The representation of the analyzed cryptographic 
primitive and the search procedure. 


Representation. The tool internally represents differences at bit level, allow- 
ing to store all possible stages from a completely unrestricted bit over signed 
differences down to exact values. Thus, the same tool can be used in the search 
for a characteristic and in the search for a message pair. The conditions are 
grouped in words representing the internal variables of the hash function. These 
words can then be connected with any operations (typically bitwise functions or 
modular additions) to define the hash function. 


Search. The search procedure uses the bitwise conditions as variables, and 
attempts to find a solving assignment with the help of a heuristic guess-and- 
determine strategy [8], similar to SAT solvers. The following steps are repeated 
until a solution is found: 

- Guess: Pick a bit and guess its value (e.g., no difference, or a specific assign- 
ment). 

- Determine: The previous guess influences other connected bit conditions. 
Determine these effects, which might result in further refinement of other bit 
conditions, or a contradiction. 

- Backtrack: If a contradiction is detected, resolve this conflict by undoing 
previous guesses and replacing them with other choices. 

This simple approach alone is not sufficient to go through the whole search 
space, so numerous refinements have been proposed to fine-tune this method. 
These include the detection of two-bit conditions [18], backtracking strategies, 
and a look-ahead approach to guide the search [9]. Additionally, SHA-2-specific 
heuristics and strategies [18, 19] have been proposed, deciding which parts of the 
state to guess with higher priority. 


3.2 Finding Starting Points for SHA-2 

To model SHA-2 as a satisfiability problem for the search tool, we need to intro- 
duce suitable intermediate variables. Based on the alternative description from 
Sect. 2, we only use the words Ai and Ei of the state, plus the words Wi of 
the message expansion. Figure 2 illustrates the update rules for A , E and W by 
highlighting the input words for updating each word: Each row represents one 
of the 80 step iterations, with its three state words Ai, Ei , and Wi. 
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Fig. 2. Update rules to compute Ai,Ei , and Wi (■) from other state words ( ). 


Local Collisions. All our results are based on “local collisions” in the message 
expansion: by carefully selecting (expanded) message words in the middle steps 
so that the differences can cancel out in as many consecutive steps as possible 
in the forward and backward expansion, i.e., the first and last few expanded 
message words contain no differences. The t middle steps with differences can 
induce differences in the Ai and E{ words. However, the Wi words can be used 
to achieve zero difference in the last 4 of the t words £*, and in the last 8 of the 
t words Ai. This is necessary to obtain words with zero difference in the very 
last 4 steps of the state update and thus in the output chaining value. 

As an example, the starting point for the 27-step collisions for SHA-256 [18] 
allows differences in expanded message words W7, Wg, HA 2, HA 5, and HA 7, as 
well as state words £7, • • • , £13 and A?, . . . , Aiq. The exact bitwise signed dif- 
ferences are chosen during the search such that any potential differences in 
W\ 9 , W22 5 W23, W24, as well as £14 , . . . , £17 and Aiq, . . . , A13 cancel out. The 
resulting starting point is illustrated in Fig. 3a. We show in Sect. 4.3 how the 
same starting point can be used for SHA-512. 

The semi-free-start collision starting point covering the most steps so far is 
for 38 steps of SHA-256 [19] and SHA-512 [9], with a local collision spanning 
t = 18 steps. Considering the large number of steps, the number of expanded 
message words with differences and cancellations is remarkably low: only 6 words 
with differences, and 6 words imposing cancellation conditions. 

To find candidates for a higher number of steps, we enumerated all possible 
selections of active message words (more precisely, of some t < 20 intermediate 
expanded message words, the “core words” of the local collision) and investigated 
the forward and backward expansion under certain assumptions: the t core words 
are chosen freely, according to the message expansion rule; in the forward and 
backward expansion, if at least 2 of the input words have differences, they are 
assumed to cancel out, while a single input word with difference never cancels 
out. Criteria for selecting suitable candidates then include a low number t of 
spanned steps and a low number of required cancellation constraints. The best 
(consistent) result for 39 steps, spanning t = 19 steps with 9 cancellations, is 
given in Fig. 3b. 
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(a) 27-step collision of SHA-256 [18] 
and SHA-512 (Sect. 4.3). 


(b) 39-step semi-free-start collision of 
SHA-512 (Sect. 4.1). 


Fig. 3. SHA-2 starting points: Words with differences ■ and cancellations , B. 


Semi-Free-Start Collisions and Collisions. The discussed starting points 
are targeted to find semi-free-start collisions, that is, different messages m, m' 
and an IV ho such that f(ho,m) = f(ho,m'). However, they can also be used 
for hash function collisions with the original IV ho by trading the freedom of the 
IV for freedom in the message words. 

In order to find hash function collisions, the first few message words Wi must 
retain sufficient freedom (i.e., they should not be constrained by conditions from 
the message expansion for cancelling differences) to allow to match the correct 
IV value. Ideally, this means that the first 8 message words Wo, . . . , W7 are free 
of any conditions (no differences, but also not constrained by conditions from 
other message words connected via the message expansion) . If the Wi differences 
are sparse enough overall, it can also be sufficient to have at least 5 words 
Wo , . . . , W4 free of conditions by providing the remaining freedom with a two- 
block approach [19]. 

The starting points of Fig. 3a and 3b both have at least 7 message words 
free of differences in the beginning. However, the local collision shown in Fig. 3b 
spans over t m 19 steps. Thus, the first message words are constrained by many 
conditions, leaving not enough freedom to match the correct IV. In contrast, the 
11-step local collision shown in Fig. 3a provides enough freedom in the first 7 
message words to be used in a single-block collision attack [18]. 
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4 Collision Attacks for Truncated SHA-512 Variants 

The hash functions SHA-512/224 and SHA-512/256 differ from SHA-512 in their 
IV and a final processing step, which truncates the 512-bit state to 224 or 256 
bits, respectively. Consequently, the semi-free- start collisions demonstrated for 
SHA-512 [9] are also valid for these truncated versions (since the IV is non- 
standard anyway in this attack scenario). In this section, we first improve these 
results by providing 39-step semi-free-start collisions for SHA-512 and its vari- 
ants. We then extend this result to free-start collisions for 43-step SHA-512/256 
and 44-step SHA-512/224. By free-start collisions, we mean two messages m,m' 
and two IVs ho, h' 0 such that the hash values of m (under IV ho) and m' (under 
IV hg) collide. Note that free-start collisions are not equivalent to collisions of the 
compression function for truncated SHA -2 versions, since the truncated output 
bits of the last compression function call may contain differences. Additionally, 
we present collisions for 27 steps of SHA-512, SHA-512/224, and SHA-512/256. 

4.1 Semi-free-start Collisions 

We use the 39- step starting point from Fig. 3b. Previous work showed that sparse 
differences particularly in the A words are essential for the success probability of 
the message modification phase. For this reason, we additionally require that in 
6 words between Ag and A\g, namely An, A 12 , A\ 3 , A\ 4 , A\ 5 , and Ayj, differences 
also cancel out. The five consecutive zero-difference words in A also force En 
to zero difference. These additional requirements are already marked in Fig. 3b 
(hatched area). 

The first task for the search procedure with the solving tool is to fix a suitable 
signed characteristic. Compared to the previously published 38-step SHA-512 
semi-free-start collision [9], the local collision for our starting point spans 19 
steps (compared to previously 18) and has 9 (previously 6 ) active expanded 
message words. Cancellations are also required in 9 (previously 6 ) expanded 
message words. This increases the necessity for very sparse differences in A and 
Wi in steps 16-26. For this reason, we require a single-bit difference in W 26 , W 17 
and A\g, and very low Hamming weights for the other words. We finally found 
a characteristic with at most two active bits in almost all words of A and W 
(except A 9 ,Aw,Wu,W 12 )- 

After the characteristic is fixed, we need to find a complying message pair. We 
start by guessing the dense parts in A and Ei , hoping that the sparser conditions 
in the later steps are fulfilled probabilistically. Since the dense parts are already 
almost fully determined by the characteristics and the sparse parts pose only 
so few conditions, a message pair is easily found. The result is a semi-free-start 
collision valid for all SHA-512 variants. We give an example in Appendix A in 
Table 4a. 

4.2 Free-Start Collisions 

Free-start collisions are a generalization of semi-free-start collisions, so the 39- step 
results obtained in the previous section give a first result for SHA-512/224 and 
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SHA-512/256. However, we can take advantage of the truncated output bits to add 
several more steps. If we add another step in the beginning or in the end, the exist- 
ing difference pattern remains unchanged, but there will be differences in the word 
Wo (computable via backward expansion, which includes Wi+g = W 9 , the previous 
Wg from Fig. 3b) or in the new word H 39 (via the normal forward expansion, which 
includes Wg 9-15 = H 24 ), respectively. These, in turn, can imply differences in £_ 4 
or in A 39 and E 39 , which translates to differences in the IV (turning semi-free- 
start into free-start results, and included in the hash value via the feed-forward) 
or directly in the compression function output, respectively. 

The advantage of adding steps in the beginning is that it is possible to limit 
the additional differences in the state update words to E , and keep A free of new 
differences. Any differences in £_ 1 , . . . , £_ 4 will be added to the compression 
function output with the final feed-forward, but the corresponding words of the 
result are truncated, so the hash outputs still collide. 


Free-Start Collisions for 43-Step SHA-512/256. Since SHA-512/256 trun- 
cates the last 4 output words of the compression function call (£79 + £_i, 
£ 78 +£_ 2 , £77 +£- 3 , and £ 75 +£- 4 ), differences in £_i, . . . , £-4 are acceptable 
for a free-start collision. This observation allows us to add 4 additional steps in 
the beginning of the 39- step starting point from Fig. 3b. Shifting the characteris- 
tic “downwards” by 4 steps causes the previous message words W 12 , . • • , W 4 g to 
turn into new expanded message words Wig , . . . , W 19 ] in particular, this affects 
the difference in the previous word W 12 . To determine a compatible difference 
pattern for the new first 4 words, the message expansion can be computed back- 
wards from the new words W 4 , . . . , W 19 via 


Wi = W i+16 - *i(w i+14 ) - fF i+9 - <To(W i+1 ). 


It turns out that all 4 new words will contain differences (Wg from W 3+9 = 
W 12 ; W 2 from W 2+1 = Wg and VF 2+14 = Wiq; Wi from W\+\ = W 2 and 
W 1+14 = Wig] and Wq from Hb+i = kFi, IT0+14 = W 44 and Hb+ 16 = W 4 q). 
However, similar to steps 27-30, the state words Ai and E{ can be kept free of 
differences for 4 steps. To achieve this, the search tool needs to find differences in 
the IV words £_ 4 , . . . , £_ 1 to cancel out those in Wo , . . . , Wg when computing 
£ 0 , . . . , £ 3 . The resulting starting point is given in Fig. 4a. 

For the search procedure with the solving tool, we fixed the signed differences 
of steps 12-30 to the same values as the 39- step SHA-512 semi- free- start collision 
of Sect. 4.1. Then, to complete the characteristic, we first search for a valid 
solution for the dense part of the middle steps (Ai and Ei in steps 13-16, and 
Ei in steps 17-27), and finally fix the corresponding message words Wi in steps 
13-17, which determines the complete state, including the dense differences in 
the prepended steps and IV. 

The search only takes seconds on a standard computer; an example for a 
free-start collision is given in Appendix A in Table 3a. 


622 C. Dobraunig et al. 


iy(A) 

iy(£) 


Ai 



Ei 

Wi 

■§ 



'S/S///S//SS* 











(a) 43-step SHA-512/256. 



(b) 44-step SHA-512/224. 


Fig. 4. Potential free-start starting points (differences ■ and cancellations I, M). 


Free-Start Collisions for 44-Step SHA-512/224. A very similar strategy 
can be employed to extend the previous 43-step free-start collision by another 
step for SHA-512/224. Prepending an additional step shifts the difference of 
previous word to Eq, which in turn requires a cancellation in Aq and a 
difference in A_ 4 , as illustrated in Fig. 4b. However, only the least significant 
32 bits of the corresponding compression function output word are truncated. 
Furthermore, this output word is computed from A _ 4 via modular addition, so 
even differences only in the lower 32 bits can possibly cause differences in the 
untruncated output bits. 

Fortunately, the underlying characteristic of signed differences as used for the 
39-step SHA-512 semi-free-start collision is well compatible with our constraints: 
The difference in A _ 4 needs to cancel that in W 4 in a modular addition (via E 0 , 
by Eqs. (3) and ( 2 ) or Fig. 2 , since all other involved words have zero difference). 
This difference of W 4 , in turn, is dictated by that in W 13 (by the update rule 
for W 20 , where again all other involved words have zero difference). None of 
these equalities involves any of the bitwise functions cr, A, MAJ or IF. Thus, the 
modular difference in A _ 4 must be the same as that in W 13 , which is already 
fixed by the underlying characteristic to a modular difference of +32. Written as 
bitwise differences, this will translate to a single-bit difference (in the sixth least 
significant bit) with probability \ (which does not carry over to the untruncated 
bits of the final output with overwhelming probability). Indeed, the example for 
a free-start collision given in Appendix A in Table 2a only displays this single-bit 
difference in A _ 4 (and no carries in the output bits). 
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4.3 Collisions 

So far, the best practical collisions found for SHA-512 are those for 24 steps, 
proposed independently by Sanadhya and Sarkar [23] and Indesteege et al. [ 11 ], 
together with 24-step collisions for SHA-256. While the results for SHA-256 have 
since been improved to 27 [18], 28 [19] (both practical), and finally 31 steps [19] 
(theoretical attack with almost practical complexity), no such improvements 
have been proposed for SHA-512 so far. The main reason for this seems to be 
the doubling in state size from SHA-256 to SHA-512; this larger search space 
increases the difficulty of the problem for the search tools. 


Starting Point for SHA-512. Since the message expansion is essentially the 
same for all SHA -2 variants (except for different word sizes and rotation val- 
ues, of course), the SHA-256 starting points can theoretically also be used for 
SHA-512. However, the resulting search complexity is different. For our results, 
we used the 27-step starting point (based on a local collision over the t— 11 steps 
7-17), as illustrated in Fig. 3a. Just as the 39-step semi-free-start starting point 
(Fig. 3b), it requires that differences cancel in E in 4 of the t steps (£ 44 , . . . , E\ 7 ) 
and in A in the 4 previous steps (A 10 , . . . , A 13 ), as well as in several steps of the 
message expansion. 

Finding a solution from this starting point requires significantly more effort 
than for SHA-256. Of course, we also tried to expand our search to the closely 
related 28-step starting point, which adds an additional step in the beginning 
of the 27-step version. However, with the additional constraints imposed on the 
message expansion by this added step we could not find any suitable (reasonably 
sparse) characteristics. 

In contrast to the results from Sect. 4.2, since the IV needs to exactly match 
the original IV, we were not able to take advantage of the final truncation to sim- 
plify the search process, or add additional steps. We first search a characteristic 
for SHA-512, and then try to use it to match the different IVs for SHA-512/224 
and SHA-512/256. 


Search Strategy. The search progresses in several stages, as illustrated in 
Fig. 5: 

1. Fix Signed Characteristic: 

(a) Find Candidate Characteristic (Fig. 5a): First fix the signed differ- 
ences of the message expansion W (5 words) and state update A (3 words). 
Since the word Wi? poses conditions on the first few message words, whose 
freedom we will later need to match the IV, we focus on keeping its signed 
difference as sparse as possible, with only few difference bits. With much 
lower priority, also determine the differences in the state update words E 
(7 words) to complete the signed characteristic. The characteristic is very 
dense in E , but this only has limited influence on the success of the IV 
matching phase. 
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Fig. 5. Stages of the 27-step collision search (guessed values ■ and differences ED, 
derived values B, and previously fixed values and differences ). 


(b) Verify Dense Parts (Fig. 5b): Fully determine the values of A and E in 
the densest steps 7-9 to verify the validity of the candidate characteristic. 
If necessary, fix any remaining free bits of A and E in steps 10-11. This 
fully determines As , . . . , An, Ej , . . . , E\\ and Wn. 

To maneuver the search process in the large search space and detect con- 
tradictions as soon as possible, we need to apply the look-ahead strategies 
previously employed for semi-free-start collisions on SHA-512 [9] in this stage 
(with 16 look-ahead candidates per guess). 

2. Message Modification to Match IV: Starting from the best signed char- 
acteristics of the previous stage, with the correct IV inserted, find a solution 
message pair step by step: 

(a) Match IV (Fig. 5c): Fix the values in the more difficult, heavily con- 
strained words first (W\ o, W$, W$, W 7 ). Choosing Wio and W$ also deter- 
mines A 2 and A\ (via Eq and E 5 ). Together with W 7 , Wg, and the IV, 
this determines all values in steps 0-11. 
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(b) Finalize Message for Sparse Parts (Fig. 5d): choosing the 4 remaining 
message words W 12 , ■ ■ . , IV 15 allows to satisfy the remaining, sparse parts 
of the characteristic in steps 12-26 with high probability. 

Unlike the other stages, guesses are not made randomly here, but systemat- 
ically word-by-word. Since most conditions are from modular additions, we 
always start from the least significant bits and proceed towards the more 
significant bits. This last stage needs to be repeated for each IV separately, 
which takes some hours on a single CPU per target IV. 


Results. Our results for collisions for 27-step SHA-512/224, SHA-512/256, and 
SHA-512 are given in Appendix A in Tables 2b, 3b, and 4b, respectively. 

Acknowledgments. This research (or a part of this research) is supported by Cryp- 
tography Research and Evaluation Committee (CRYPTREC) and by the Austrian 
Research Promotion Agency (FFG) and the Styrian Business Promotion Agency (SFG) 
under grant number 836628 (SeCoS). 


A Examples 

An example for the semi-free-start collisions of Sect. 4.1 is given in Table 4a. 
Results for the free-start collisions of Sect. 4.2 are given in Tables 2a and 3a, and 
for the collisions of Sect. 4.3 in Tables 2b, 3b, and 4b. 
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Table 2. Results for SHA-512/224. 


(a) Example of a free-start collision for 44 steps of SHA-512/224. 


ho 

f ef 65b64d3694995 959fbfb82ed84ebl Id9e855642e62ef 2 335cc6d027695d91 

921dl97e5cfa2803 e26c6eb26163a692 9f f 3cf 4d26f lde78 5323942861d9139a 

ho 

f ef65b64d3694995 959fbfb82ed84ebl Id9e855642e62ef 2 335cc6d027695dbl 

a712860cdcf alf f 8 470749bbf 7628f 44 20cdfd694df 67216 8e07b5f a2c7f edf 0 

Aho 

0000000000000000 0000000000000000 0000000000000000 0000000000000020 

350f 9f 72800037fb a56b2709960129d6 bf 3e32246b07ac6e dd2421d24da6f e6a 

m 

7al9df 6089d00684 03ed2a0d0c29e00e 36c91e35f 681fbb8 bb2b47428aef f 294 

dce94ccc981d39a3 44230f 73cf 56d9ef e9d46b26b44950c8 550bed4b9419741c 

58a98894206e00de f 3448a6f 761d384d 9ae59f 3a3bcc5bba ece85d5c77be431b 

6e3cf 817e9376cc7 b74a2a43c0b96c93 7c5b51d6f e2a0c26 5a9868e5bf 2e422d 

* 

m 

5e031bbe28b2d027 ded424ef 85255cc3 ad2f 514be0830clf a635dab40aef f a9f 

dce94ccc981d3983 44230f 73cf 56d9ef e9d46b26b44950c8 550bed4b9419741c 

58a98894206e00de f 3448a6f 761d384d 9ae59f 3a3bcc5bba ece85d5c77be431b 

6e3cf 817e9376cc7 b74a2a43c0b96cb3 5c5b51d6f e2a0c36 5a9868e5bf 2e420d 

Am 

241ac4deal62d6a3 dd390ee2890cbccd 9be64f 7el602f 7a7 ldle9df 68000080b 

0000000000000020 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000020 2000000000000010 0000000000000020 

hi 

e309edf 68f 4d89b8 5c356e0359eb0dab 76b4a45ec3c2cd25 8bd0955d 


(b) Example of a collision for 27 steps of SHA-512/224. 


m 

20dbf 13a352116a9 295506e205af d435 abf e4826742clala 279f 07c7813dd9be 

47da77c701a98858 25aecl349d486501 37a992al5616ea31 e2bl22ecf 19e90d3 

2f f f 6025dc03dd67 032c261d740f 459e 2e2599bd6e7e74df d490bd22815eb494 

72f edf If 607df 6e3 87f c91f cfb7397f d e647blb499eeel7f 2df f 8e493cbc8a4c 

m* 

20dbf 13a352116a9 295506e205af d435 abf e4826742clala 279f 07c7813dd9be 

47da77c701a98858 25aecl349d486501 37a992al5616ea31 5ccl250cbl9e90d3 

203fdfe5dc03dd66 032c261d740f 459e 2e2599bd6e7e74df d490bd22815eb494 

f 0bc01167075f 6eb 87f c91f cfb7397f d e647blb499eeel7f d3f 8f e713d7c8a4c 

Am 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 be7007e040000000 

Of cObf C000000001 0000000000000000 0000000000000000 0000000000000000 

8242de09 10080008 0000000000000000 0000000000000000 f e07703801c00000 

hi 

65blle66e48da563 Ib70dl2da92e2dba 8f 338768bb95601b 60b995bb 
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Table 3. Results for SHA-512/256. 

(a) Example of a free-start collision for 43 steps of SHA-512/256. 


ho 

159b52516f lOf 30d 546b2042f 240af ee f 25339b24c441edf d62c698666558242 

e5a9e39861fbd81d d2138eacc20d5224 a332cl6df 23609fb 73f 78341df d7a4e5 

K 

159b52516f lOf 30d 546b2042f 240af ee f 25339b24c441edf d62c698666558242 

e5a9e39861fbd83d 72e259ce420d5a0f 4db37906cc361264 ae579d9e0275b446 

Aho 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000020 aOf ld7628000082b ee81b86b3e001b9f ddaOledf dda210a3 

m 

cfbec86f lcf 6821e dd3343c25aad835a 2a08612b753f 3d6b b328d40d2c624ef 7 

b3e51f 8a3a63bd6f 4abdf 96375bbf 609 a8c5clf 784672e86 a78e2aa625830d4b 

169dcb5039bf 3d9f fbcc43f f ebd8ae47 lb3eaef ccf 5c6a46 f 668a2a728851b4e 

37460 lea44422bdb 2ca290d26a23a02f 6685babbf dcb5e22 e000111457201f d4 

777 * 

ee37d77210586a56 b2a4122800ad72cf 89399609f 53f 3560 b328d40d2c624ed7 

b3e51f 8a3a63bd6f 4abdf 96375bbf 609 a8c5clf 784672e86 a78e2aa625830d4b 

169dcb5039bf 3d9f fbcc43f f ebd8ae47 lb3eaef ccf 5c6a46 f 668a2a728851b4e 

37460 lea44422bfb 0ca290d26a23a03f 6685babbf dcb5e02 e07el51457202055 

Am 

21891f Id0caee848 6f 9751ea5a00f 195 a331f 7228000080b 0000000000000020 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000020 2000000000000010 0000000000000020 007e040000003f 81 

h i 

ld7 041bbbf f a676a 03d8c440d9246b9d 20ce2dl7c5b0b2c4 7e6e4d33a7f 54af d 


(b) Example of a collision for 27 steps of SHA-512/256. 


m 

306b0c2ebe7cl341 c8b55d4df lc5f 4f e b91al73aeceb818a 33b5977f 9b46e58b 

6c6d5a4f 87f 1364f Ib7e33249d4acf 4f b7f 784ecdcaef elf a33edaf e7af c0452 

df c0200932c2b9df f aec7d05e3518e56 ec2el9a7ee867396 d490bd22815eb494 

72f edf If 887df 303 f 95891f 08483da25 c327d0af a2c4f 902 2c5f 0c0806a4e298 

m* 

306b0c2ebe7cl341 c8b55d4df lc5f 4f e b91al73aeceb818a 33b5977f 9b46e58b 

6c6d5a4f 87f 1364f Ib7e33249d4acf 4f b7f 784ecdcaef elf Id4eddle3af c0452 

d0009f c932c2b9de f aec7d05e3518e56 ec2el9a7ee867396 d490bd22815eb494 

f 0bc01169875f 30b f 95891f 08483da25 c327d0af a2c4f 902 d2587c300764e298 

Am 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 be7007e040000000 

Of cObf C000000001 0000000000000000 0000000000000000 0000000000000000 

8242de09 10080008 0000000000000000 0000000000000000 f e07703801c00000 

hi 

f cba5c8f af 05f d68 c676b8f 17b5daae3 6233801174b7f dOl Of f 72ab4a869c54f 
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Table 4. Results for SHA-512. 

(a) Example of a semi-free-start collision for 39 steps of SHA-512. 


ho 

eccf 3dal89dd9668 blec21a4f d53b8d8 609ce4465f 772770 adf 4e7738e2978f 6 

8edd237ea50eebc9 231b3af 0102a926d db45e613e8d2f d52 ad384433420073f 6 

m 

a0ec9872cf f f e63c df 5c6a2b59f 4c453 f 2bea3763f c8f a7a 6a47e8f f 0a995116 

f a59232e8b617048 4c9690984c084498 28bee8f 5701eabl6 8d57686ecbdce623 

3879318f 901f f 782 72644b0ca55a6142 6cb281dabll480b4 4a8198441f 401f f 2 

5ffd956edlla2b5f 9a640988d68287d3 74942df 792f 2637f b2819dc61f 772d4f 

m* 

a0ec9872cf f f e63c df 5c6a2b59f 4c453 f 2bea3763f c8f a7a 6a47e8f f 0a995116 

fa59232e8b6 17048 4c9690984c084498 28bee8f 5701eabl6 8d57686ecbdce623 

3879318f 901f f 7a2 52644b0ca55a6152 6cb281dabll48094 4aff9c441f 402073 

6001956edlla2a5f 9a640988d68287d3 74942df 792f 2637f b2819dc61f 772d4f 

Am 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000020 2000000000000010 0000000000000020 007e040000003f 81 

3ff C000000000100 0000000000000000 0000000000000000 0000000000000000 

hi 

3aa73bfae7b82789 711f 2024cf Of 636e 0c6965f 707279a53 8227fba8617aa955 

f dd9e2ca8c4d0038 57db244560d7b70b 08ec5698343353c0 9e9b739ee307ea92 


(b) Example of a collision for 27 steps of SHA-512. 


m 

537e7a4986aa2f ce 11206ad0306c752b 90124a9elc9b0ce2 8cl4e0356f d26f 5f 

fd3ef 90ea3e4366f 35d8c2ba58abd92f b23e476632ecalf d e2bl22ef 46649b73 

df c020070e628f 37 7acf 74dldl007558 6c6359a6f e7f e2f 0 d490bd22815eb494 

72f edf If 807df 6f 3 a8585af 19b6dd9dl 3d2053b0c295522b 2d970e0e52a49081 

m* 

537e7a4986aa2f ce 11206ad0306c752b 90124a9elc9b0ce2 8cl4e0356f d26f 5f 

f d3ef 90ea3e4366f 35d8c2ba58abd92f b23e476632ecalf d 5ccl250f 06649b73 

d0009f c70e628f 36 7acf 74dldl007558 6c6359a6f e7f e2f 0 d490bd22815eb494 

f 0bc01169075f 6fb a8585af 19b6dd9dl 3d2053b0c295522b d3907e3653649081 

Am 

0000000000000000 0000000000000000 0000000000000000 0000000000000000 

0000000000000000 0000000000000000 0000000000000000 be7007e040000000 

Of cObf C000000001 0000000000000000 0000000000000000 0000000000000000 

8242de09 10080008 0000000000000000 0000000000000000 f e07703801c00000 

hi 

d838f Id2ae4bf 185 3f c837ae9bbc28d4 6b2f 2977f 58a9697 99c48839f 0e8bdca 

c9c0a86f edld921a 2f 823blf al913751 3bal70b902c6da30 9c4e5807be51a7e7 
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Abstract. We explore time-memory and other tradeoffs for memory- 
hard functions, which are supposed to impose significant computational 
and time penalties if less memory is used than intended. We analyze 
three finalists of the Password Hashing Competition: Catena, which was 
presented at Asiacrypt 2014, yescrypt and Lyra2. 

We demonstrate that Catena’s proof of tradeoff resilience is flawed, 
and attack it with a novel precomputation tradeoff. We show that using 
m 4/5 memory instead of M we have no time penalties and reduce the 
AT cost by the factor of 25. We further generalize our method for a 
wide class of schemes with predictable memory access. For a wide class 
of data-dependent schemes, which addresses memory unpredict ably, we 
develop a novel ranking tradeoff and show how to decrease the time- 
memory and the time- area product by significant factors. We then apply 
our method to yescrypt and Lyra2 also exploiting the iterative structure 
of their internal compression functions. 

The designers confirmed our attacks and responded by adding a new 
mode for Catena and tweaking Lyra2. 


Keywords: Password hashing • Memory-hard • Catena • Tradeoff • 
Cryptocurrency • Proof-of-work 


1 Introduction 

Memory-hard functions are a fast emerging trend which has become a popu- 
lar remedy to the hardware-equipped adversaries in various applications: cryp- 
tocurrencies, password hashing, key derivation, and more generic Proof-of-Work 
constructions. It was motivated by the rise of various attack techniques, which 
can be commonly described as optimized exhaustive search. In cryptocurren- 
cies, the hardware arms race made the Bitcoin mining [29] on regular desktops 
tremendously inefficient, as the best mining rigs spend 30,000 times less energy 
per hash than xSG-desktops/laptops 1 . This causes major centralization of the 
mining efforts which goes against the democratic philosophy behind the Bitcoin 
design. This in turn prevents wide adoption and use of such cryptocurrency in 

1 The estimate comes from the numbers given in [6]: the best ASICs make 2 32 hashes 
per joule, whereas the most efficient laptops can do 2 17 hashes per joule. 
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economy, limiting the current activities in this area to mining and hoarding, 
whith negative effects on the price. Restoring the ability of CPU or GPU mining 
by the use of memory-hard proof-of-work functions may have dramatic effect on 
cryptocurrency adoption and use in economy, for example as a form of decentral- 
ized micropayments [15]. In password hashing, numerous leaks of hash databases 
triggered the wide use of GPUs [3,34], FPGAs [27] for password cracking with 
a dictionary. In this context, constructions that intensively use a lot of memory 
seem to be a countermeasure. The reasons are that memory operations have 
very high latency on GPU and that the memory chips are quite large and thus 
expensive on FPGA and ASIC environments compared to a logic core, which 
computes, e.g. a regular hash function. 

Memory- intensive schemes, which bound the memory bandwidth only, were 
suggested earlier by Burrows et al. [8] and Dwork et al. [17] in the context of 
spam countermeasures. It was quickly realized that to be a real countermeasure, 
the amount of memory shall also be bounded [18], so that memory must not 
be easily traded for computations, time, or other resources that are cheaper 
on certain architecture. Schemes that are resilient to such tradeoffs are called 
memory-hard [21,30]. In fact, the constructions in [18] are so strong that even 
tiny memory reduction results in a huge computational penalty. 

Disadvantage of Classical Constructions and New Schemes. The provably tradeoff- 
resilient superconcentrators [32] and their applications in [18,19] have serious 
performance problems. They are terribly slow for modern memory sizes. A super- 
concentrator requiring N blocks of memory makes 0(N log N) calls to F. As a 
result, filling, e.g., 1 GB of RAM with 256-bit blocks would require dozens of calls 
to F per block (C log N calls for some constant C ) . This would take several minutes 
even with lightweight F and is thus intolerable for most applications like web 
authentication or cryptocurrencies. Using less memory, e.g., several megabytes, 
does not effectively prohibit hardware adversaries. 

This has been an open challenge to construct a reasonably fast and tradeoff- 
resilient scheme. Since the seminal paper by Dwork et al. [18] the first important 
step was made by Percival, who suggested scrypt [30]. The idea of scrypt was 
quite simple: fill the memory by an iterative hash function and then make a 
pseudo-random walk on the blocks using the block value as an address for the 
next step. However, the entire design is somewhat sophisticated, as it employs a 
stack of subfunctions and a number of different crypto primitives. Under certain 
assumptions, Percival proved that the time- memory product is lower bounded 
by some constant. The scrypt function is used inside cryptocurrency Litecoin [4] 
with 128 KB memory parameter and is now adapted as an IETF standard for 
key-derivation [5]. scrypt is a notable example of data- dependent schemes where 
the memory access pattern depends on the input, and this property enabled Per- 
cival to prove some lower bound on adversary’s costs. However, the performance 
and/or the tradeoff resilience of scrypt are apparently not sufficient to discourage 
hardware mining: the Litecoin ASIC miners are more efficient than CPU miners 
by the factor of 100 [1]. 
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The need for even faster, simpler, and possibly more tradeoff-resilient con- 
structions was further emphasized by the ongoing Password Hashing Competi- 
tion [2], which has recently selected 9 finalists out of the 24 original submissions. 
Notable entries are Catena [20], just presented at Asiacrypt 2014 with a security 
proof based on [26], and yescrypt and Lyra2 [25], which both claim performance 
up to 1 GB/sec and which were quickly adapted within a cryptocurrency proof- 
of-work [7]. The tradeoff resilience of these constructions has not been challenged 
so far. It is also unclear how possible tradeoffs would translate to the cost 

Our Contributions. We present a rigorous approach and a reference model to 
estimate the amortized costs of password brute-force on special hardware using 
full- memory algorithms or time- space tradeoffs. We show how to evaluate the 
adversary’s gains in terms of area-time and time-memory products via compu- 
tational complexity and latency of the algorithm. 

Then we present our tradeoff attacks on the last versions of Catena and 
yescrypt, and the original version of Lyra2. Then we generalize them to wide 
classes of data-dependent and data-independent schemes. For Catena we ana- 
lyze the faster Dragonfly mode and show that the original security proof for 
it is flawed and the computation-memory product can be kept constant while 
reducing the memory. For ASIC-equipped adversaries we show how to reduce the 
area-time product (abbreviated further by AT) by the factor of 25 under reason- 
able assumptions on the architecture. The attack algorithm is then generalized 
for a wide class of data-independent schemes as a precomputation method. 

Then we consider data-dependent schemes and present the first generic trade- 
off strategy for them, which we call the ranking method. Our method easily 
applies to yescrypt and then to the second phase of Lyra2, both taken with min- 
imally secure time parameters. We further exploit the incomplete diffusion in the 
core primitives of these designs, which reduces the time- memory and time- area 
products for both designs. 

Altogether, we show how to decrease the time- memory product by the factor 
of 2 for yescrypt and the factor of 8 for Lyra2. Our results are summarized in 
Table 1. To the best of our knowledge, our methods are the first generic attacks 
so far on data-dependent or data-independent schemes 2 . 

Related Work. So far there have been only a few attempts to develop tradeoff 
attacks on memory-hard functions. A simple tradeoff for scrypt has been known 
in folklore and was recently formalized in [20]. Alwen and Serbinenko analyzed 
a simplified version of Catena in [9] . Designers of Lyra2 and Catena attempted 
to attack their own designs in the original submissions [20,25]. Simple analysis 
of Catena has been made in [16]. 

Paper Outline. We introduce necessary definitions and metrics in Sect. 2. We 
attack Catena-Dragonfly in Sect. 3 and generalize this method in Sect. 4. Then 
we present a generic ranking algorithm for data-dependent schemes in Sect. 5 


2 The full version of this paper is available at [14]. 
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Table 1 . Our tradeoff gains on Catena, yescrypt and Lyra2 with minimal secure para- 
meters, 2 30 memory bytes and reference hardware implementations (Sect. 2). TM loss 
is the maximal factor by which we can reduce the time-memory product compared to 
the full-memory implementation. AT loss is the maximal factor for time- area product 
reduction. Compactness of TM and AT is the maximal memory reduction factor which 
does not increase the TM or AT, resp., compared to the default implementation. 



Catena- Dragonfly 

Generic 1-pass 

yescrypt 

Lyra2 vl 

Time 

T = 3M 

T — M 

T = 4/3 M 

T = 2M 


Section 3 

Section 5 

Section 6 

Appendix A 

TM loss 

200 

1.28 

2.1 

8 

AT loss 

25 

1.28 

2.1 

3 

TM compactness 

64 

4 

5.8 

16 

AT compactness 

64 

4 

4.5 

5 


and attack yescrypt with this method in Sect. 6. The attack on Lyra2 is quite 
sophisticated and we leave it for Appendix A. 

2 Preliminaries 

2.1 Syntax 

Let Q be a hash function that takes a fixed-length string I as input and outputs 
tag H. We consider functions that iteratively fill and overwrite memory blocks 
X[l], X[2]j . . . , X[M] using a compression function F: 

X[i j ] = f j (I),l<j<s; (1) 

X[i 3 ] = F(X[Mj)],X[Mj)}, • • -,X[M) ]), s<j<T, (2) 

where pi are some indexing functions referring to some already filled blocks and 
fj are auxiliary hash functions (similar to F) filling the initial s blocks for some 
positive s. 

We say that the function makes p passes over the memory, if T = pM. 
Usually p and M are tunable parameters which are responsible for the total 
running time and the memory requirements, respectively. 

2.2 Time-Space Tradeoff 

Let A be an algorithm that computes Q. The computational complexity C(A) is 
the total number of calls to F and fi by A, averaged over all inputs to Q. We 
do not consider possible complexity amortization over successive calls to A. The 
space complexity S(A) is the peak number of blocks (or their equivalents) stored 
by A, again averaged over all inputs to Q. Suppose that A can be represented as 
a directed acyclic graph with vertices being calls to F. Then the latency L(A) is 
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the length of the longest chain the graph from the input to the output. Therefore, 
L(A) is the minimum time needed to run A assuming unlimited parallelism and 
instant memory access. 

A straightforward implementation of the scheme (1) results in an algorithm 
with computational complexity T and latency L = T and space complexity M. 
However, it might be possible to compute Q using less memory. According to [24], 
any function, that is described by Eq. (1) and whose reference block indices 4>j(i) 
are known in advance, can be computed using memory blocks for some 

constant c& depending on the number k of input blocks for F. Therefore, any 
p-pass function can be computed using less than M = T/p memory for suffi- 
ciently large M. 

Let us fix some default algorithm A of Q with (Ci,Mi,Li) being computa- 
tional and space complexities and latency of A, respectively. Suppose that there 
is a time- space tradeoff given by the family of algorithms 3 B = {B q } that com- 
pute Q using space for different q. The idea is to store only one of q memory 
blocks on average and recompute the missing blocks whenever they are needed. 
Then we define the computational penalty Cs(q) as 

CPM = ® 


and latency penalty Ls(q). 


LP B (q ) = 


L(B q ) 

Li 


2.3 Attackers and Cost Estimates 

We consider the following attack. Suppose that Q with time and memory para- 
meters (T, M ) is used as a password hashing function with / = (P, 5), where P 
is a secret password and S' is a public salt. An attacker gets H and S (e.g., from 
a database leak) and tries to recover P. He attempts a dictionary attack: given 
a list L of most probable passwords, he runs Q on every P G L and checks the 
output. 

Definition 1 . Let <L> be a cost function defined over a space of algorithms. Let 
also Gt,m be a hash function with fixed algorithm Ao (default algorithm). Then 
Gt,m is called (a, <P) -secure if for every algorithm B for Gt,m 

0(B) > a&(A). 

In other words, Gt,m can not be computed cheaper than by the factor of 

The cost function is more difficult to determine. We suggest evaluating amor- 
tized computing costs for a single password trial. Depending on the architecture, 
the costs vary significantly for the same algorithm A. For the ASIC-equipped 
attackers, who can use parallel computing cores, it is widely suggested that the 

3 As well as A, the family B admits parallel implementations. 
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costs can be approximated by the time-area product AT [9,11,28,35]. Here T is 
the time complexity of the used algorithm and A is the sum of areas needed to 
implement the memory cells and the area needed to implement the cores. Let the 
area needed to implement one block of memory be the unit of area measurement. 
Then in order to know the total area, we need core-memory ratio R c , which is 
how many memory blocks we can place on the area taken by one core. 

Suppose that the adversary runs algorithm B q using M/q memory and l com- 
puting cores, thus having computational complexity C q = C(B q ). The running 
time is lower bounded by the latency L q = L(B q ) of the algorithm. If L q < C q /l , 
i.e. the computing cores can not finish the work in minimum time, then the time 
T can be approximated by C q /l, and the costs are estimated as follows: 

AT B ,(0 = (iRc + y ) X = C ^ Rc + jd 

We see that the costs drop as l increases. Therefore, the adversary would be 
motivated to push it to the maximum limit C q /L q . Thus we obtain the final 
approximation of costs: 

AT Bg =C q R c + L q j. (3) 

Here we assume unlimited memory bandwidth. Taking the bandwidth restric- 
tions into account is even more difficult, as they depends on the relative fre- 
quency of the computing core and the memory as well as on the architecture of 
the memory bus. Moreover, the memory bandwidth of the algorithm depends 
on the implementation and is not easy to evaluate. We leave rigorous memory 
bandwidth evaluation and restrictions for the future work. 

We recall that the value R c is depends on the architecture, the function 
F, and the block size. To give a concrete example, suppose that the block is 
64 bytes and F is the Blake-512 hash function. We use the following reference 
implementations 4 : 

- The 50-nm DRAM [22], which takes 550 mm 2 per GByte; 

- The 65-nm Blake-512 [23], which takes about 0.1mm 2 . 

Then the core-memory ratio is 2 4 5 q ,:l « 3000. For more lightweight hash func- 
tions this ratio will be smaller. 

The actual functions F in the designs that we attack are often ad-hoc and 
have not implemented yet in hardware. Moreover, the numbers may change 
when going to smaller feature size. To make our estimates of the attack costs 
architecture- independent, we introduce a simpler metric — the time-memory 
product TM: 

M 

TM B q =L q ~, (4) 

which for not so high computational penalties gives a good approximation of AT. 


4 


We take low- area implementations, as possible parallelism is already taken into 
account. 
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In our tradeoff attacks, we are mainly interested to compare the AT and TM 
costs of B q with that of the default algorithm A. Thus we define the AT ratio 

of B q as and the TM ratio of B q as 

We note that for the same TM value the implementation with less memory 
is preferable, as its design and production will be cheaper. Thus we explore how 
much the memory can be reduced keeping the AT or TM costs below those of 
the default algorithm. 

Definition 2. Tradeoff algorithms B have AT compactness q if it is the maximal 
q such that 

AT Bq < AT a . 

Tradeoff algorithms B have TM compactness q if it is the maximal q such that 

TM Sq < TM^. 

For the concrete schemes we take “minimally secure” values of T, i.e. those 
that supposed to have (<a, ^-security for reasonably high a. Unfortunately, no 
explicit security claim of this kind is present in the design documents of the 
functions we consider. 

Data- Dependent and Data- Independent Schemes. The existing schemes can be 
categorized according to the way they access memory. The data-independent 
schemes Catena [20], Pomelo [36], Argon2i [13] computes (f)(j) independently 
of the actual password in order to avoid timing attacks like in [33]. Then the 
algorithm B that uses less memory can recompute the missing blocks just by the 
time they are requested. Therefore, it has the same latency as the full-memory 
algorithm, i.e. L(B) = Lq. For these algorithms the time-memory product can be 
arbitrarily small, and the minimum AT value is determined by the core-memory 
ratio. 

The data- dependent schemes scrypt [30] yescrypt [31], Argon2d [13] compute 
<f(j) using the just computed block: <f(j) = (j)(j,Xi jl ). Then precomputation is 
impossible, and for each recomputing block the latency is increased by the latency 
of the recomputation algorithm, so L q > Lq. There exist hybrid schemes [25], 
which first run a data-independent phase and then a data-dependent one. 

3 Cryptanalysis of Catena-Dragonfly 

3.1 Description 

Short History. Catena was first published on ePrint [20] and then submitted to 
the Password Hashing Competition. Eventually the paper was accepted to Asi- 
acrypt 2014 [21]. In the middle of the reviewing process, we discovered and com- 
municated the first attack on Catena to the authors. The authors have introduced 
a new mode for Catena in the camera-ready version of the Asiacrypt paper, which 
is resistant to the first attack. The final version of Catena, which is the finalist 
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of the Password Hashing Competition, contains two modes: Catena-Dragonfly 
(which we abbreviate to Catena-D), which is an extension to the original Catena, 
and Catena-Butterfly, which is a new mode advertised as tradeoff-resistant. In 
this paper we present the attack on Catena-Dragonfly, which is very similar to 
the first attack on Catena. 

Specification. Catena-D is essentially a mode of operation over the hash function 
P, which is be instantiated by Blake2b [10] in the full or reduced-round version. 
The functional graph of Catena-D is determined by the time parameter A (values 
A = 1, 2 are recommended) and the memory parameter n, and can be viewed as 
(A + l)-layer graph with 2 n vertices in each layer (denoted by Catena-D-A). We 
denote the X-th. vertex in layer l (both count from 0) by [X] 1 . With each vertex 
we associate the corresponding output of the hash function F and denote it by 
[X 1 ] as well. The outputs are stored in the memory, and due to the memory 
access pattern it is sufficient to store only 2 n blocks at each moment. The hash 
function H has 512-bit output, so the total memory requirements are 2 n+6 bytes. 
First layer is filled as follows 

- [0]° = Gi(P, S'), where G\ invokes 3 calls to F; 

- [1]° = G 2 (P, S'), where G 2 invokes 3 calls to F 

- H° <- F([i - 1]°, [i - 2]°), 2 < i < 2 n — 1. 

Then 2 3n / 4 nodes of the first layer are modified by function P. The details of P 
are irrelevant to our attack. 

The memory access pattern at the next layers is determined by the bit- 
reversal permutation v. Each index is viewed as an n-bit string and is trans- 
formed as follows: 

y{x x x 2 ...x n ) = x n x n _ i . . . x 1: where x { G {0, 1}. 

The layers are then computed as 

" [OP »F([ Op’ 1 || [2 n - lp" 1 ); 

Thus to compute [X] 1 we need [v{X) l ~ x ]. The latter can be then overwritten 5 . 
An example of Catena-D with A = 2 and n = 3 is shown at Fig. 1. 

The bit-reversal permutation is supposed to provide memory-hardness. The 
intuition is that it maps any segment to a set of blocks that are evenly distributed 
at the upper layer. 

Original Tradeoff Analysis. The authors of Catena-D originally provided two 
types of security bounds against tradeoff attacks. Recall that Catena-D-A can be 
computed with A2 n calls to F using 2 n memory blocks. The Catena-D designers 

5 In terms of Eq. (1) we could enumerate all blocks as [if — j\\ ^ i ^ so that f(j\\i) = 

n bits 

(i-i)IK*)- 
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password, salt 

000 001 010 011 100 101 110 111 



Fig. 1. Catena-D-2 with n — 3. 3 layers, 8 vertices per layer. 


demonstrated that Catena-D-A can be computed using A S memory blocks with 
time complexity 6 


T < 2 n + 2 n 




A 


Therefore, if we reduce the memory by the factor of g, i.e. use only blocks, 
we get the following penalty: 


^( 9 ) -(§)'• W 

The second result is the lower bound for tradeoff attacks with memory reduction 
by q: 

Px(q)>f2(q X ). (6) 

However the constant in i?() is too small (2 -18 for A = 3) to be helpful in 
bounding tradeoff attacks for small q. More importantly, the proof is flawed: 
the result for A = 1 is incorrectly generalized for larger A. The reason seems to 
be that the authors assumed some independence between the layers, which is 
apparently not the case (and is somewhat exploited in our attack). 

In the further text we demonstrate a tradeoff attack yielding much smaller 
penalties than Eq. (5) and thus asymptotically violating Eq. (6). 


3.2 Our Tradeoff Attack on Catena-D 

The idea of our method is based on the simple fact that 

v^{X)) = X , 

where X can be a single index or a set of indices. We exploit it as follows. We 
partition layers into segments of length 2 k for some integer fc, and store the first 
block of every segment (first two blocks at layer 0). As the index of such a block 
ends with k zeros, we denote the set of these blocks as [* n “ fe 0 fe ]. We also store 
all 2 3n / 4 blocks modified by E, which we denote by [E]. 


This result is a part of Theorem 6.3 in [20]. 


642 


A. Biryukov and D. Khovratovich 


Consider a single segment [AB* k ], where A is a k-bit constant, B is a n — 2k- 
bit constant. Then 

v([AB* k \) = [* k v(B)v(A)\. 

Blocks [* k i '(B)v(A)\ belong to 2 k segments that have v(B) in the middle of the 
index. Denote the union of these segments by [* k v(B)* k ]. Now note that 

v([* k v(B)* k \) = [* k B* k ], 


and 

v{v{[* k B* k ])) = [* k B* k ]. 

Therefore, when we iterate the permutation z/, we are always within some 2 k 
segments. We suggest the computing strategy in Algorithm 1. At layer t we 
recompute 2 k full segments from layers 0 to t — 2 and 2 k subsegments of length 
v(A) (interpreted as a number in the binary form) at layer t — 1. Therefore, the 
total cost of computing layer t is 

C(t) - l)2 2k + u(A)2 k + 2 k ) = (7) 

A B 

= J2((i - 1)2" + v{A) 2 n ~ k + 2 n ~ k ) = 

A 

(t - l)2 n+k + 2 n+k ~ 1 + 2" = (t - i)2 ra+fe + 2". 

The total cost of computing Catena-D-A is 

2" (^-2 k + X+bj . 

We store (t + l)2 n ~ k blocks as segment starting points, 2 3n / 4 5 6 blocks [T] and 2? k 
blocks for intermediate computations. For k = log q + log (A + 1) and q < 2 n ' 4 


Algorithm 1 . Tradeoff for Catena- Dragonfly. 

1. Compute layer 0 storing [* n-fc 0 fc ]° and [* n_fc O fe_1 l] 0 , i.e. the first two blocks of 
every segment. 

2. Compute F and store all the updated blocks [r] in the memory. 

3. Compute layer 1 segmentwise: for each segment [AB* 1 *] 1 recompute blocks 
[* k i/(B) v(A)]° using stored blocks from layer 0 and [F]. Store blocks [* n_fc 0 fc ] 1 . 

4. Compute layer 2 segmentwise: for each segment [AB* k ] 2 recompute 2 k segments 
[* k B k ]° using stored blocks from layer 0, then use them to recompute blocks 
[* k v(B) v(A)] 1 using [F], then compute [AB* k ] 2 . Store blocks [* n ~ k 0 k ] 2 . 

5. Compute layer 3 segmentwise: for each segment [AB* k ] 3 recompute 2 k segments 
[* k v(B) k ]° using stored blocks from layer 0, then recompute 2 k segments [ * k B k ] x 
using stored blocks from layer 1 and [F], then recompute blocks [* k v(B) v(A)] 2 , 
then compute [AB* k ] 3 . Store blocks [* n-fc 0 fe ] 3 . 

6. Compute other layers in the similar fashion. 
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we store about 2 n /q blocks, so the memory is reduced by the factor of q. This 
value of k yields the total computational complexity of 


C q = 2 n 


qX 2 (X + 1) 


+ A + 1 


(8) 


Since the computational complexity of the memory- full algorithm is (A + l)2 n , 
our tradeoff method gives the computational penalty 



1 . 


Since Catena is a data- independent scheme, the latency of our method does 
not increase. Therefore, the time-memory product (Eq. (4)) can be reduced by 
the factor of 2 n / 4 . We can estimate how AT costs evolves assuming the reference 
implementation in Sect. 2.3: 


ATr = 2 71 


qX 2 (X + 1) 


+ A + 1 


• 3000 + (A + l)2 n 


2 n 
Q ’ 


For q = 2 n / 5 and A = 2 we get 

AT b 2U/5 - 2” (6 • 2 n/5 ) • 2 11 ' 5 + 3 • 2 9n/5 . 


For n = 24 (1 GB of RAM) we get 


AT 


B 2 4.8 


224+2.5+4.8+11.5 2 43 - 2 + 1 - 5 


2 44 


whereas 

AT Sl = 2 49 ' 5 . 

Therefore, we expect the time-area product dropped by the factor of about 
25 if the memory is reduced by the factor of 30. In the terms of Definition 1, 
Catena-D-2 is not (1/25, AT)-secure. Our tradeoff method also have AT and 
TM compactness at least 2 n / 5 = 64. 

On other architectures the AT may drop even further, and we expect that an 
adversary would choose the one that maximizes the tradeoff effect, so the actual 
impact of our attack can be even higher. 


Violation of Catena-D Lower Bound. Our method shows that the Catena- 
D lower bound is wrong. If we summarize the computational costs for A layers, 
we obtain the following computational penalty for the memory reduction by the 
factor of q: 

CP\(q) = 0(X 3 q), 

which is asymptotically smaller than the lower bound f2(q x ) (Eq. (6)) from the 
original Catena submission [20]. 
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Table 2. Computation- memory tradeoff for Catena-D-3 and Catena-D-4. 


Memory fraction 

Catena-D-3 

Catena-D-4 


Computational penalty 


Our 

[20] 

Our 

[20] 

1 

2 

7.4 

36.2 

13.8 

512 

1 

4 

15.5 

252 

26.6 

7373 

1 

8 

30.1 

1872 

52 

2 17 

1 

2 l 

2 ^+ 1. 9 

2 31 

2 ^+ 2. 8 

242 + 1.5 


3.3 Other Results for Catena 

Our attack on Catena can be further scrutinized and generalized to non-even 
segments. More details are provided in [14] with the summary given in Table 2. 

4 Generic Precomputation Tradeoff Attack 

Now we try to generalize the tradeoff method used in the attack on Catena for a 
class of data-independent schemes. We consider schemes Q where each memory 
block is a function of the previous block and some earlier block: 

X[i\ <- F(X[i - l],X[0(i)]),O <i<T 

where 0 is a deterministic function such that (f>(i) < i. A group of existing 
password hashing schemes falls into this category: Catena [20], Pomelo [36], 
Lyra2 [25] (first phase). Multiple iterations of such a scheme are equivalent to a 
single iteration with larger T and an additional restriction 

x — M < (j>{x), 

so that the memory requirements are M blocks. 

The crucial property of the data-independent attacks is that they can be 
tested and tuned offline, without hashing any real password. An attacker may 
spend significant time to search for an optimal tradeoff strategy, since it would 
then apply to the whole set of passwords hashed with this scheme. 

Precomputation Method. Our tradeoff method generalizes as follows. We divide 
memory into segments and store only the first block of each segment. For every 
segment I we calculate its image (f)(1). Let (f)(1) be the union of segments that 
contain (f)(1). We repeat this process until we get an invariant set Uk = U(I): 


U ° £/i U 2 


The scheme Q is then computed according to Algorithm 2. 
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Algorithm 2. Precomputation method 

- For all segments precompute the block indices in the union chains I — > U (/) 

- Compute Q by segment. For each segment /: 

1. Compute blocks U(I) = Uk ; 

2. Compute blocks in Uk- i, then in Uk- 2 , up to Uo = I. 

3. Store the first block of I in the memory. 


The total amount of calls to F is JA >0 1^1? an d the P ena lty to compute I is 


CP(I) = 


Ez>0 Wi\ 

\I\ 


How efficient the tradeoff is depends on the properties of 0 and the segment parti- 
tion, i.e. how fast Ui expands. As we have seen, Catena uses a bit permutation for 
</>, whereas Lyra2 uses a simple arithmetic function or a bit permutation [20,25]. 
In both cases Ui stabilizes in size after two iterations. If 0 is a more sophisticated 
function, the following heuristics (borrowed from our attacks on data- dependent 
schemes) might be helpful: 

- Store the first T\ computed blocks and the last T 2 computed blocks for some 
Ti,T 2 (usually about N/q). 

- Keep the list C of the most expensive blocks to recompute and store M[i\ if 
(j){i) G C (Fig. 2). 


U 2 



I = U 0 



Fig. 2. Segment unions in the precomputation method. 


5 Generic Ranking Tradeoff Attack 

Now we present a generic attack on a wide class of schemes with data-dependent 
memory addressing. Such schemes include scrypt [30] and the PHC finalists 
yescrypt [31], Argon2d [13], and Lyra2 [25]. We consider the schemes described 
by Eq. (1) with k = 2 and the following addressing (cf. also Fig. 3): 

*[i] = m 

for 1 < i < T 

n = g(X[i - 1]); 

X[i}=F(X[i-l},X[r i }). 


(9) 
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Here g is some indexing function. This construction and our tradeoff method 
can be easily generalized to multiple functions F, to stateful functions (like in 
Lyra2), to multiple inputs, outputs, and passes, etc. However, for the sake of 
simplicity we restrict to the construction above. 



Fig. 3. Data-dependent schemes. 


Our tradeoff strategy is following: we compute the blocks sequentially and 
for each block X[i\ decide if we store it or not. If we do not store it, we calculate 
its access complexity A(i) - the number of calls needed to recompute it as a sum 
of access complexities of X[i — 1] and X[ri\ plus one. If we store X[i\, its access 
complexity is 0. 

The storing heuristic rule is the crucial element of our strategy. The idea is 
to store the block if Afrf) is too high. 

Our ranking tradeoff method works according to Algorithm 3 (Fig. 4). 


Algorithm 3. Ranking method 

1. Split the memory into segments of s blocks; 

2. Keep the sorted list C of the T/l highest access complexities, initialize it with all 
zeros; 

3. Temporarily store last w blocks. 

4. Compute blocks sequentially. For block X[i], if X[ri] is missing, recompute it. 

5. If X[i\ is the starting block of the segment, we store it and set A(i) — 0; 

6. If X[i\ is not the starting block of the segment, but A(n) G £, we store X[i\ and 
set A{%) = 0; 

7. If X[i\ is not the starting block of the segment, and A(ri) £ W e do not store 
X[i\ and set A(i) = A(n) + A(i — 1) + 1. 


Here re, s and l are parameters, and we usually set l = 3s. The computational 
complexity is computed as 

i 

We also compute the latency L{i) of each block as L{i) = ma x(I/(r^), L{i — 1)) + 1 
if we do not store X[i\ and L{i) =0 if we store it. Then the total latency is 
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top -T/l 

ZXAAA 



stored 


Fig. 4. Outline of the ranking tradeoff method. 


We implemented our attack and tested it on the class of functions described 
by Eq. (9). For fixed w and s the total number of calls to F and the number 
of stored blocks is entirely determined by indices {r^}. Thus we do not have 
to implement a real hash function, and it is sufficient to generate r* according 
to some distribution, model the computation as a directed acyclic graph, and 
compute C and L for this graph. We made a number of tests with uniformly 
random (within the segment [0; i\ and T = 2 12 ) and different values of w and 
s. Then we grouped C and L values by the memory complexity and figured the 
lowest complexities for each memory reduction factor. These values are given in 
Table 3. 


Table 3. Computational, latency, AT (for R c — 3000 and M = 2 24 ), and TM penalties 
for the ranking tradeoff attack on generic data-dependent schemes. 


Memory fraction (1 /q) 

1 

2 

i 

3 

i 

4 

i 

5 

i 

6 

i 

7 

i 

8 

i 

9 

i 

10 

Computation penalty CP(q) 

1.59 

2.98 

7.3 

16.6 

57.5 

180 

635 

3340 

2 13 ' 2 

Latency penalty LP(q) 

1.56 

2.55 

4 

5.8 

8.7 

11.6 

15.4 

21.1 

24.8 

AT ratio 

0.78 

0.85 

1.02 

1.16 

1.45 

1.69 

2.04 

2.97 

4.24 

TM ratio 

0.78 

0.85 

1.02 

1.16 

1.45 

1.65 

1.9 

2.34 

2.48 

Segment length s 

3 

5 

8 

10 

13 

16 

18 

21 

23 

Window size || 

0.06 

0.01 

0.01 

0 

0 

0 

0 

0 

0 


We conclude that generic 1-pass data-dependent schemes with random 
addressing are (0.75, AT)- and (0.75, TM)-secure using our ranking method. Both 
AT and TM ratios exceed 1 when q > 4, so both the AT- and the TM-compactness 
is about 4. 
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6 Cryptanalysis of yescrypt 

6.1 Description 

yescrypt [31] is another PHC finalist, which is built upon scrypt and is notable for 
its high memory filling rate (up to 2GB/sec) and a number of features, which 
includes custom S-boxes to thwart exhaustive search on GPU, multiplicative 
chains to increase the ASIC latency, and some others, yescrypt is essentially a 
family of functions, each member activated by a combination of flags. Due to 
the page limits, we consider only one function of the family. 

Here we consider the yescrypt setting where flag yescrypt_RW is set, there is 
no parallelism, and no ROM (in the further text - just yescrypt). It operates on 
1024-byte memory blocks X[1],X[2], . . . , X[M]. The scheme works as follows: 

X[i] <- F(X[i - 1] © X[(j)(i)}), 1 < i < M; 

Y <— X[M]; 

Y <- X[Y mod M}) <- F(Y © X[Y (mod M)]), M <i<T. 

Here F and F' are compression functions (the details of F' are irrelevant for the 
attack). Therefore, the memory is filled in the first M steps and then (T — M) 
blocks are updated using the state variable Y. Here </>(i) is the data-dependent 
indexing function: it takes 32 bits of X[i — 1] and interprets it as a random block 
index among the last 2 k blocks, where 2 k is the largest power of 2 that is smaller 
than i. 

Transformation F operates on 1024- byte blocks as follows: 

- Blocks are partitioned into 16 64-byte subblocks Ho, Hi, ... , H15. 

- New blocks are produced sequentially: 

B new f{B old 0 B oM). 

B new ^ _ f( B new 0 gold^ g < { < 16 . 

The details of / are irrelevant to our attack. 

6.2 Tradeoff Attack on yescrypt 

Our crucial observation is that there is no diffusion from the last subblocks 
to the first ones. Thus if we store all Hq, we break the dependencies between 
consecutive blocks and the subblocks can be recomputed from Hi to H15 with 
pipelining (Fig. 5). Suppose that the block X[i\ is computed with latency L(i), 
i.e. its computation tree has L(i) levels if measured in F. However, if we consider 
the tree of /, then the actual latency of X[i\ is L(i) + 15 instead of expected 
16H(i) if measured in calls to /. 
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The tradeoff strategy is given in Algorithm 4. 


Algorithm 4. Tradeoff attack on yescrypt. 

1. Start the ranking tradeoff method with some parameters w, s; 

2. Store Bo of each block; 

3. If X[i\ needs the missing block X[ri]: 

(a) Compute Bo of X[i\ using one call to /, as all previous Bo are stored; 

(b) Compute B\. 

(c) While B\ is recomputed, start recomputing F 2 , as it needs exactly the same 
subblocks used in the recomputation of B\. This adds latency of one call to /. 

(d) Compute Bi for all other i. 


If the missing block is recomputed by a tree of depth D , then the latency of 
the new block is D + 16 measured in calls to /, or ^ + 1 if measured in calls to 
F. This number should be compared to the latency D + 1 if we had not exploited 
the iterative structure of F. Thus if the ranking method gives the total latency 
L (measured in F), the actual latency should be L ^ 6 15 . 

For the smallest secure parameter (T = 4M/3) we get the final computational 
and latency penalties as well as AT and TM penalties are given in Table 4 (1/16- 
th of each block is added to the attacker’s memory). We conclude yescrypt is only 
(0.45, AT)- and (0.45, TM)-secure, whereas the AT compactness is 4 and the TM 
compactness is 6. Since this numbers are worse than for generic 1-pass schemes, 
our attack clearly signals of a vulnerability in the design of BlockMix. We expect 
that our attack becomes inefficient for T = 2 M and higher. 


Table 4. Computational, latency, AT (for R c = 3000 and M = 2 24 ), and TM penalties 
for the ranking tradeoff attack on yescrypt mode of operation with 4/3 passes, using 
the iterative structure of F. 


Memory 

1 

1 

2 

i 

3 

i 

4 

i 

5 

i 

6 

i 

7 

i 

8 

Computation penalty CP(q) 

1 

2.9 

26 

1135 

2 19 

- 

- 

- 

Latency penalty LP(q) 

1 

1.1 

1.4 

2 

3.5 

6.3 

11.1 

17.5 

TM ratio 

1 

0.55 

0.47 

0.5 

0.75 

1.05 

1.59 

2.19 

AT ratio 

1 

0.55 

0.46 

0.7 

95 

- 

- 

- 


7 Future Work 

Our tradeoff methods apply to a wide class of memory-hard functions, so our 
research can be continued in the following directions: 

- Application of our methods to other PHC candidates and finalists: Pomelo [36] 
and the modified Lyra2. 
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Depth D Depth D Depth D 



Fig. 5. Pipelining the block computation in yescrypt: only the first subblock is computed 
with delay D. 


- Set of design criteria for the indexing functions that would withstand our 
attacks. 

- New methods that directly target schemes that make multiple passes over 
memory or use parallel cores. 

- Design a set of tools that helps to choose a proof-of-work instance in various 
applications: cryptocurrencies, proofs of space, etc. 

8 Conclusion 

Tradeoff cryptanalysis of memory hard functions is a young, relatively unex- 
plored and complex area of research combining cryptanalytic techniques with 
understanding of implementation aspects and hardware constraints. It has direct 
real-world impact since its results can be immediately used in the on-going arms 
race of mining hardware for the cryptocurrencies. 

In this paper we have analyzed memory-hard functions Catena-Dragonfly 
and yescrypt. We show that Catena- Dragonfly is not memory- hard despite orig- 
inal claims and the security proof by the designers’, since a hardware-equipped 
adversary can reduce the attack costs significantly using our tradeoffs. We also 
show that yescrypt is more tradeoff-resilient than Catena, though we can still 
exploit several design decisions to reduce the time-memory and the time-area 
product by the factor of 2. 

We generalize our ideas to the generic precomputation method for data- 
independent schemes and the generic ranking method for the data-dependent 
schemes. Our techniques may be used to estimate the attack cost in various 
applications from the fast emerging area of memory-hard cryptocurrencies to 
the password-based key derivation. 


Acknowledgement. We would like to thank the authors of Catena for verifying and 
confirming our attack. 
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A Cryptanalysis of Lyra2 V 1 

A.l Description of Lyra2 VI 

Lyra2 [25] is a PHC finalist, notable for its high memory filling rate (up to 
lGB/sec). Very recently, Lyra2 has been significantly changed for the second 
round of the competition. This section describes the original submission to 
PHC [25], Lyra2 vl (just Lyra2 in the further text). 

Lyra2 is a hybrid hashing scheme, which uses data- independent addressing 
in the first phase and data-dependent addressing in the second phase. Lyra2 
operates on blocks of 768 bits (96 bytes) each, and fills the memory with 2 n • C 
such blocks, where n and C are parameters, and C is by default set to 128 [25, 
p. 39]. In this paper we use C = 128. The entire memory is represented as a 
(2 n x C)-matrix M, and we refer to its components as rows and columns. Rows 
are denoted by M[i\. 

Lyra2 has two main phases: the single-iteration Setup phase, where the 
memory is addressed data- independently, and the multiple-iteration Wandering 
phase, where the memory is addressed data-dependently. The number T of 
iterations in the Wandering phase can be as low as 1, and we take this value in 
our analysis. 

Setup Phase. The first phase fills rows sequentially from M[ 0] to M[2 n — 1] as 
follows: 


M[0],M[1] / (Password, Salt); 

for i from 2 to 2 n — 1 

M[i] <— F(M[i — 1], M [</>(*)]); 

M[<j>(i)\ <- M[<f>(i)} ® M[ij. 

Here <p(i) = 2 k — i, where 2 k is the smallest power of 2 that is not smaller than 
i, M[] stands for the left rotation of each 768-bit word by 32 bits, and G is a 
cryptographic hash function. 

The following details of F are relevant to our attack: 

- Function F is stateful: it operates on the 1024-bit state 5, which is preserved 
between rows. 

- Function F(X,Y) processes columns JQ, V of V and Y sequentially. The 
internal state undergoes C rounds (similarly to the duplex-sponge construc- 
tion [12]), where in round i column Z{ of the output Z is produced as follows: 

S^P(S); 

Zi 768 least sign, bits of (S'). 

Here P is a single round of the Blake2b internal permutation [10]. We do not 
exploit any specific property of P. Thus F can be seen as a duplex-sponge 
instantiated with a Blake2b round function. 
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We remind the reader that Z is used not only to produce a new row M[i\ but 
also to overwrite the row M[2 k — i\. 

Wandering Phase. The Wandering phase transforms the blocks produced in the 
Setup phase. First, it reverses the ordering. Then it operates similarly to the 
Setup phase, but the second input block to F is taken pseudo-randomly: 

for i from 1 to 2 n — 1 

n <- g(M[i - 1 },i- 1); 

M[i] <- M[i] © F{M[i - 1], M[n])-, 

M[n } <- M[n ] © F(M[i-l],M[n]). 

Here g truncates the first input to the least significant 32 bits and xores with the 
second input. All indices are computed modulo 2 n . 

A. 2 Tradeoff Attack on the Setup and Wandering Phases of Lyra2 

Our strategy for the Setup phase is similar to the one for Catena. Again, we 
exploit the properties of the indexing function <fi. 

Let us denote a segment of rows + 1 ],..., M[j]} by M[i : j]. 

Consider a, b such that 2 /c_1 < a < b < 2 k . Then 

0([a :&]) = [( 2 fc -6):(2*-a)]. 

Thus to construct a single segment we need another segment of the same length. 
This suggests the following strategy for computing 2 n rows in the Setup phase. 

1. First 2 n ~ l rows M[0], . . . , M[2 n ~ l — 1] for some / > 0 (parameter of the 
attack) . 

2. We split rows from M[ 2 n ~ l ] to M[2 n — 1] into segments of length q for some 
q < 2 n ~ l . Store the entire state S at the start of each segment. 

Then to compute segment M[a : a + q — l]^ -1 < a < 2 k we have to 
compute M[(j)(a : a + q — 1)], which has been updated when computing segments 
between 2 k ~ 2 and 2 k ~ x . Eventually we reach the stored 2 n ~ l rows. To compute 
M[a : a + q — 1], 2 /c_1 < a < 2 k we need to compute a segment in the interval 
[2 l : 2 l+1 ) for each n — l < i < k (Fig. 6). 

Let us figure out the memory reduction and the computational overhead 
of this procedure. We store 2 n ~ l first rows and ^ rows for starting state in 
each segment, then a segment of length q during recomputation. For segments 
between rows M[2 n ~ l ] and M[2 n_/+1 ] we need 1 call to F per row, as there is no 
recomputation. For segments between rows M[2 n ~ l+1 ] and M[2 n ~ l + 2 } we need 
2 calls to F per row, and so on. In general, we make 


(k — n + l)q 


( 10 ) 
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calls to F to compute a segment of length q between row indices 2 k and 2 k+1 . 
For the entire Setup phase we spend 

+ 2^ +2 • 2 n ~ l+1 + 3 • 2 n ~ l+2 + • • • + /2 ra_1 < (l — 0.5)2" 

M[0:2 n ~ l — 1] M[2 n — z :2 n— i + 1 — 1] 

calls to F. The memory requirements are 2 n ~ l + q + which reaches the 
minimum of 2 n ~ l + 2 n / 2-4,5 for q = 2 n / 2-5,5 . 

To summarize, our tradeoff algorithm B has computational penalty (Z — 0.5) 
if the memory is reduced by the factor of 2 l (Table 5). 



Fig. 6. Computing segment of length q with precomputation method in the Setup 
phase of Lyra2. 


Table 5. Computational-memory tradeoff for the Setup phase of Lyra2: our method 
and designers’ analysis. 


Memory fraction 

Computational penalty 

[25] 

1/4 

1.5 

2 

1/8 

2.5 

4 

2~ l 

l- 0.5 

2'- 1 


Access Complexity of a Single Row. In the next phase we will need to 
calculate the cost of recomputing a single row rather than a segment. To compute 
a single row, we need to recompute (Z — 0.5) segments on average, so the average 
recomputation complexity is: 


A = q(l — 0.5). 


(ii) 
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Tradeoff Attack on the Wandering Phase of Lyra2 with T = 1. We 

apply the ranking method to the Wandering phase of Lyra2. Since Lyra2 updates 
two rows at once, its penalties are higher than in generic data-dependent schemes 
and are given in Table 6. 


Table 6. Average computational and depth penalties for the ranking method on the 
Wandering phase of Lyra2, without exploiting the row pipelining. 


Memory fraction 

1 

2 

i 

3 

i 

4 

i 

5 

i 

6 

i 

7 

i 

8 

i 

10 

i 

12 

i 

16 

Computation penalty 

2.7 

10.4 

75 

1071 

2 14 

2 n 

2 n 

2 n 

2 n 

2 n 

Depth penalty 

2.4 

4.8 

8.9 

15.4 

23.8 

35.7 

49 

83 

124 

193 


A. 3 Tradeoff for the Full Lyra2 with T = 1 

Memory Partition. To run the attack on the full Lyra2 with fraction l/l of 
memory, we have to split the available memory between Setup and Wandering 
phases. Suppose that we allocate fraction a of memory for the Setup phase 
and fraction (3 of memory for the Wandering phase. Let Ps(o) be the penalty of 
running the Setup phase with fraction a, Pr{ol) be the average access complexity 
of a single row from the Setup phase run with fraction a (Eq. (11)), and Pw{P) 
be the penalty of running the Wandering phase with fraction /3 (Table 6). Then 
the total memory reduction will be a + /3. To estimate the time penalty, we note 
that in our tradeoff for the Wandering phase, each recomputation requests as 
many rows from the Setup phase as many hash calls is made in the Wandering 
phase. Therefore, the total time penalty would be estimated as 

p(a + m = Ps(a)T ' + ^ p "- m2 ", 

as we construct 2 • 2 n blocks in two phases. 


Exploiting Iterative Compression Function. Similarly to the attack on yescrypt 
we can exploit the fact that Lyra2 produces blocks of a row columnwise. There- 
fore, we have to make D calls to P to compute the first column of the block, 
whereas computation of other columns can be pipelined: the second column of 
the deepest tree level can be computed simultaneously with the first column of 
one level higher. To compute all 128 columns, we spend time needed to compute 
D + 128 columns only, so the actual latency penalty is 1 + D/ 128. Therefore, 
the total latency penalty can be calculated as follows: 


D(a + /?) 


D s (a) 


Dr(oi)+Dw{ (3) I i 
128 ^ 1 

2 


where Ds(a) = 1 — (log a)/256 is the average latency penalty in the Setup phase, 
D R (a) = — log<a — 0.5 is the average latency penalty for accessing the row from 
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the Setup phase, and Dw{P) is the depth penalty for the Wandering phase given 
in Table 6. 

The results are given in Table 7. We conclude that Lyra2 is only (0.33, AT)- 
secure and (0.1, TM)-secure. The AT compactness is 4 and the TM compactness 
is 16. Thus Lyra2 vl is more susceptible to tradeoff attacks compared to yescrypt. 


Table 7. Computational, latency, AT (assuming R c — 3000 and M = 2 24 ) and TM 
penalties for the ranking tradeoff attack on Lyra2 vl with T — 1. Memory fraction is 
given as a sum of Wandering and Setup allocations 


Memory 

1 

0.45 

0.31 

0.26 

0.12 

0.06 

WanderingT Setup 


1/3+1/8 

1/4+ 1/16 

1/5+1/16 

1/10+1/64 

1/17+1/256 

Comp, penalty CP(q) 

1 

14.2 

133 

1876 

2 19 

- 

Lat. penalty LP(q) 

1 

1.03 

1.05 

1.08 

1.37 

1.93 

TM penalty 

1 

0.47 

0.33 

0.28 

0.16 

0.12 

AT penalty 

1 

0.47 

0.35 

0.63 

94 

- 
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Abstract. At EUROCRYPT 2012 Pandey and Rouselakis introduced 
the notion of property preserving symmetric encryption which enables 
checking for a property on plaintexts by running a public test on the 
corresponding ciphertexts. Their primary contributions are: (i) a sepa- 
ration between ffind-then-guess’ and deft- or- right’ security notions; (ii) 
a concrete construction for left-or-right secure orthogonality testing in 
composite order bilinear groups. 

This work undertakes a comprehensive (crypt) analysis of property 
preserving symmetric encryption on both these fronts. We observe that 
the quadratic residue based property used in their separation result is 
a special case of testing equality of one-bit messages, suggest a very 
simple and efficient deterministic encryption scheme for testing equality 
and show that the two security notions, find-then- guess and left-or-right, 
are tightly equivalent in this setting. On the other hand, the separation 
result easily generalizes for the equality property. So contextualized, we 
posit that the question of separation between security notions is prop- 
erty specific and subtler than what the authors envisaged; mandating 
further critical investigation. Next, we show that given a find-then-guess 
secure orthogonality preserving encryption of vectors of length 2n, there 
exists left-or-right secure orthogonality preserving encryption of vectors 
of length n, giving further evidence that find-then-guess is indeed a mean- 
ingful notion of security for property preserving encryption. Finally, we 
cryptanalyze the scheme for testing orthogonality. A simple distinguish- 
ing attack establishes that it is not even the weakest selective find-then- 
guess secure. Our main attack extracts out the subgroup elements used 
to mask the message vector and indicates greater vulnerabilities in the 
construction beyond indistinguishability. Overall, our work underlines 
the importance of cryptanalysis in provable security. 
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1 Introduction 

The question of constructing practical cryptographic schemes for securing data in 
the cloud has attracted a lot of research during the last decade. Notions like order 
preserving encryption [8,10], attribute-based encryption [21,24,26], functional 
encryption [1,6, 14-16,25] and format preserving encryption [7] are useful for this 
purpose. The notions of IBE [11,12,19] and public key encryption with keyword 
search [13,17,33,34] deal with testing of equality. Homomorphic encryption too 
[22,23, 35] plays an important role in cloud security. These schemes aim to achieve 
data privacy, user privacy, secure computation on encrypted data, etc., on the 
cloud. 

At EUROCRYPT 2012 Pandey and Rouselakis [29] defined the notion of 
property preserving symmetric encryption (PPEnc) which can be used for data 
clustering [27]. This notion, the authors claim, is most useful in the symmetric 
key setting. A PPEnc scheme is a collection of four algorithms, namely, Setup, 
Encrypt, Decrypt and Test where Test is used to check whether the underly- 
ing messages satisfy a particular property or not. The authors claim that it is 
sufficient to consider a simpler notion called property preserving tag (PPTag), 
obtained by dropping the decryption algorithm. The standard approach is to use 
a semantic secure symmetric key encryption scheme to encrypt the “payload” 
message while the encryption algorithm of PPTag is used to create a “tag” that is 
used as one of the inputs to Test to publicly check whether the message satisfies 
the property or not. In fact a similar approach was taken in [28,32]. Following the 
Bellare et al. approach for standard encryption [4,5], they define several security 
notions for property preserving encryption such as find-then-guess (FtG) and 
left-or-right (LoR) security. However, unlike Bellare et al. [4] who showed FtG 
implies LoR in the ordinary symmetric key setting, [29] claims that there is a 
separation between FtG and LoR notions and a hierarchy among the FtG classes 
that does not collapse. While the notion of property preserving encryption and 
its security are defined in the abstract setting of a general k - ary property, the 
separation results are conditioned on the assumed existence of a PPEnc for a 
concrete binary property based on quadratic residuosity, called P qr . Finally, the 
paper proposes a scheme for achieving orthogonality, which is claimed to be LoR 
secure in the generic bilinear group model. 

Property preserving encryption has a direct connection with predicate pri- 
vate encryption [32]. In such a scheme, given a token one can check whether a 
ciphertext satisfies a certain predicate or not. A PPTag scheme may be easily con- 
structed from a predicate private encryption scheme by concatenating ciphertext 
and token for a given message. If one starts from a full secure predicate-private 
scheme, one obtains an LoR secure PPTag scheme [1,29]. In [29], the authors also 
claim that property preserving encryption is a generalization of order preserving 
encryption of Boldyreva et al. [8-10]. 

Our Motivation. Property preserving symmetric encryption is an interesting 
new concept, with a potential practical application for outsourcing computa- 
tion and it is related to several other primitives like order preserving encryption 
and predicate encryption. Hence it is imperative that this notion be critically 
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evaluated from the definitional perspective. Because of the separation, design- 
ers working on the problem of constructing property preserving encryption for 
various concrete properties may tend to disregard the FtG notion and only aim 
at the strongest LoR notion, which is likely to take considerably more resources, 
see, for example, [1]. Thus it is natural to ask whether the separation indicates 
any real gap between the two notions and generalizes to any concrete property 
of interest or is it an artifact related to the peculiarities of the property con- 
sidered in [29]. The importance of cryptanalyzing the proposed provably secure 
construction requires no further emphasis. 

Our Contributions. In Sect. 3, we revisit the separation results of [29]. As no 
concrete construction of FtG-secure scheme for P qr was suggested to validate the 
separation results, we first attempt to build such a scheme. The first observation 
is that the quadratic residuosity property used in the separation results of [29], 
can be generalized to a property preserving test of equality. Hence we focus on 
equality property and show that one-time pad is sufficient to achieve FtG security 
for equality preserving encryption of one-bit messages. Furthermore, the two 
notions of FtG and LoR security in fact collapse in such a deterministic setting. 
This result is further generalized for equality testing of n-bit messages where 
we show a pseudo-random permutation is sufficient to achieve the strongest LoR 
security. Thus, on one hand we can easily generalize the separation results of 
[29] for the equality property, on the other we show that in concrete terms the 
two notions of FtG and LoR effectively collapse for this property. This points to 
the inherent ambiguity with respect to the actual implication of the separation 
results for concrete properties of interest. Thus contextualized, we note that 
the question of whether the separation results of [29] actually indicate any real 
world difference between the two notions of FtG and LoR security for property 
preserving encryption still remains open. 

In Sect. 4, we look at the relation of FtG and LoR in the context of orthog- 
onality property. We show that given an FtG secure orthogonality preserving 
encryption of vectors of length 2 n, there exists LoR secure orthogonality pre- 
serving encryption of vectors of length n. This result gives further credence 
to our already established evidence that FtG is indeed a meaningful notion of 
security for property preserving encryption. We also show that in the property 
preserving scenario orthogonality implies equality. 

In Sect. 5, we cryptanalyze the scheme for testing orthogonality from [29]. We 
show that the PPEnc scheme given in [29, Sect. 5] is not even weakest selective 
find-then-guess secure, which falsifies the claim [29, Theorem 5.1] that it is LoR 
secure. Going beyond indistinguishability, we show that if an adversary is allowed 
just one query and then given a ciphertext for some unknown message vector 
x = (xi,...,x n ), s/he can extract significant non-trivial information about x 
including whether x is orthogonal to any message of adversary’s choice. Thus 
the attack defeats the very purpose of having property preserving encryption in 
the symmetric key setting and may be of independent interest in understanding 
the security of cryptographic schemes in the composite order pairing setting. 

We draw our conclusion in Sect. 6. Some of the detailed proofs are provided 
in Appendix A. 
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2 Definitions 

We recall the basic definition of property preserving encryption and notions of 
its security from [29]. The paper claims that the idea makes most sense in the 
symmetric key setting - in the public key setting an adversary can gain non- 
trivial information about a target ciphertext by encrypting messages of her own 
choice and then testing for the property on the target message. 

As in [29] , we too model any &-ary property on Ad as a Boolean function on 
M k . One of the main properties considered is orthogonality, which depends on 
computing inner products in finite dimensional vector spaces over finite fields. 
Let v = (fi, . . . , v n ) and w = (w i, . . . , w n ) be vectors over a finite field ¥ q . The 
inner product between them is defined as v • w = v\Wi + . . . + v n w n (mod q). 
These vectors are orthogonal if v • w = 0. 

Definition 1. A property preserving encryption scheme (PPEnc,) for the k-ary 
property P is a collection of four probabilistic polynomial time (PPT) algorithms, 
which are defined as follows: 

1. Setup(l A ); This takes as input the security parameter and outputs the message 
space (M ), public parameters (PP) and the secret key (SK). 

2. Encrypt (PP,SK,m): This algorithm outputs the ciphertext CT corresponding 
to the message m, using the secret key SK and public parameter PP. 

3. Decrypt (PP, SK,CT): This algorithm outputs the plaintext message m. 

4- Test(CTi,. . . , CT k , PP): This is a public algorithm that takes as inputs 
ciphertexts CT \, . . . , CT \ corresponding to messages mi, . . . , m k , respectively 
and outputs a bit. 

These set of four algorithms must satisfy the standard correctness requirement. 
In addition, if the Test algorithm outputs b E {0, 1} then, except with negligible 
probability, one has P(mi, . . . , rrik) = b. 

A related notion of PPTag scheme was also defined. Informally, such a scheme 
does not have any decrypt module. 

Definition 2. A property preserving tag scheme (PPTag) for the k-ary property 
P is a collection of three probabilistic polynomial time (PPT) algorithms, which 
are defined as follows: 

1. Setup(l A ): This takes as input the security parameter and outputs the message 
space (M ), public parameters (PP) and the secret key (SK). 

2. Encrypt (PP,SK,m): This algorithm outputs the ciphertext CT corresponding 
to the message m, using the secret key SK and public parameter PP. 

3. Test(CTi ? , . . , CT k , PP): This is a public algorithm that takes as inputs 
ciphertexts CT \, . . . , CT corresponding to messages mi, ... , m k, respectively 
and outputs a bit. 

This set of algorithms must satisfy the standard correctness requirement. If the 
Test Igorithm outputs b G {0, 1} then, except with negligible probability, one has 
P(mi, . . .,m k ) = b. 
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Remark 1. In [29], the authors suggest the following strategy while designing a 
secure property preserving encryption scheme. The actual “payload” message is 
encrypted using an IND-CPA secure symmetric encryption scheme. For testing 
the property, a tag is constructed for each message using a PPTag scheme. 


2.1 Security Notions 

Inspired by the study of security notions of symmetric key encryption by Bellare 
et al. [4], Pandey and Rouselakis [29] propose several notions of security for 
property preserving symmetric encryption. These notions are defined by taking 
into account the specific nature of PPEnc. Here we informally describe the two 
notions of security for such schemes which are most relevant to our work. For 
more details refer to [29]. 

Definition 3. For a k-ary property P, any two sequences X = (aq, . . . , x n ) and 
Y = (?/i, . . . , y n ) of inputs are said to have the same equality pattern if 

P(x h ,...,Xi k ) = P(y h ,...,yi k ), V(ii,...,ifc) E [n\ k . 


Find-then-Guess Security (FtG). Challenger and adversary A = {A\,A 2 ) 
plays the following game Game^^ x (b) which is formally defined in [29, Sect. 3]. 
After the Setup phase, in Ai, the adversary first adaptively queries the encryp- 
tion oracle for messages (mi, . . . , m t ). Then the adversary outputs the challenge 
messages (m^m*). In A2, after the challenger returns the ciphertext of ml for 
a random b G {0,1}, the adversary again adaptively queries (mt+i, . . . , m q ). 
The adversary wins the game if s/he can correctly predict the bit b. Adver- 
sarial queries must satisfy the extra condition that the equality patterns 
of (mi, . . . , mt, mg, mt+i, . . . , m q ) and (mi, . . . , m*, m*, m^+i, . . . , m q ) are the 
same. Otherwise A can trivially win the game. 


Definition 4. Let 77 =Setup, Encrypt, Decrypt, Test be a symmetric key property 
preserving encryption scheme. Then 77 is said to be FtG secure if there exists a 
negligible function n(-) such that for all PPT FtG adversaries A as above and 
for all A G N sufficiently large , the advantage of A in the FtG game is negligible: 


A — 


Pr 


G ame mA,A 


( 1 ) = 1 


— Pr 


Game^S u 


( 0 ) = 1 < n( A). 


They [29] further introduce a hierarchy in the FtG notion depending on the 
number of challenge queries. In particular, any adversary playing the FtG 77 game, 
for r] G N, is allowed to make 77 many challenge queries interleaved between 
encryption oracle queries. A selective FtG notion may be defined in the usual 
way, following [ 11 ], where the adversary outputs the challenge messages even 
before receiving the public parameters. 

Left-or-Right Security (LoR). Challenger and adversary A plays the following 
game Game// 1 ^ x (b). After setup, A makes q encryption queries, where each query 

is of the form (m^,m^). The queries are such that the tuples (m$ \ . . . , m^) 
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and {m{ \ . . . , m^) have the same equality pattern. The challenger returns the 
encryption of for each i where the random bit b is chosen at the beginning 
of game. At the end, the adversary has to output a guess b' of b and wins if 
b' = b. The game is formally defined in [29, Sect. 3]. The definition of adversarial 
advantage is as follows. 


Definition 5. Let 77 =Setup, Encrypt, Decrypt,Test be a symmetric key property 
preserving encryption scheme. Then 77 is said to be LoR secure if there exists a 
negligible function n(-) such that for all PPT LoR adversaries A as above and 
for all A E N sufficiently large, the advantage of A in the LoR game is negligible: 


Adv 


LoR 


A 



< n( A). 


3 Separation Results: A Closer Look 


Let QffZp (resp. QAf7Z p ) be the set of quadratic residues (resp. quadratic non- 
residues) in Z* for some prime p. Consider the quadratic residuosity property 
P qr defined as follows: 


qr 



1 if mi • m 2 e QPp 
0 if mi • m 2 E QAf 1Z P 


(i) 


Assuming there exists an FtG secure property preserving encryption scheme 
77 for P qr \ Pandey and Rouselakis construct an artificial scheme 77' which is FtG 
but not LoR secure [29, Theorem 4.1]. In a similar fashion they establish that 
FtG 77 FtG ?7+1 [29, Theorem 4.4]. Note that (i) the separation results are specific 
to property P qr and (ii) conditioned on the existence of FtG secure scheme for 
P qr and no such construction was known or suggested in [29]. 

Property preserving encryption is a rather broad category and a separa- 
tion based on the specificity of a particular property does not necessarily pro- 
vide enough insight about the relationship between different security notions 
for another concrete property or how two notions are related in general. For 
example, the separation result for P qr in [29] does not give any clue whether the 
same will hold for another property, say orthogonality. Another crucial question 
is whether the separation is real or merely an artifact - is there any ‘natural’ 
construction for a ‘natural’ property that is FtG but not LoR secure. 

Clearly, a thorough investigation of these questions requires identifying nat- 
ural properties that encompass other properties and then analysing the real dif- 
ference between security notions of property preserving encryption in the context 
of these natural properties. For example, consider the set of all unary proper- 
ties. It is suggested [29] that for any unary property P, a PPTag scheme can be 
trivially obtained by providing P(m) in the clear as part of the ciphertext. We 
note that in such a scenario, the two notions FtG and LoR actually collapse. The 
case for binary properties, however, is more subtle as we see next. 
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3.1 Equivalence Testing via Equality 

We demonstrate that certain equivalence relations can be tested via equality 
property - P qr property used in [29] is one such relation. 

Claim 1. To construct a PPTag scheme for P qr ; it suffices to construct a PPTag 
scheme for equality where the message space is A4 = {0, l}. 1 

Proof The argument is quite straightforward. A “sign” function S was used 
by [29] to define P qr where S(m) = 0 if m G Q7Z P ; else S(m) = 1. In other 
words, P qr divides the message space A4 = Z* into 2 equivalence classes. Given 
any message in Z* one can efficiently determine S(m) and then use the PPTag 
scheme for equality over the message space {0, 1} to encrypt S(m). Product of 
two messages x and y belongs to Q1Z P if and only if both x and y belong to same 
equivalence class. Thus testing whether the product of x and y is a quadratic 
residue or not is now reduced to the task of testing whether S(x) and S(y) are 
equal or not. □ 

The property P qr used in [29] is a particular instance of a larger class of property 
V. In particular, the property V induces an equivalence relation on a set M. such 
that there exists an efficient algorithm to determine the class in which a given 
element lies. Another example of such property is to test, given two integers m 
and n, whether their difference is divisible by a fixed prime p. It is easy to see 
that a PPTag scheme for such a property V can be realized by any PPTag scheme 
for equality. Note, however, that there do exist equivalence relations for which 
the question of membership testing is not known to be easy. 


3.2 Natural LoR Secure Equality Testing 

We describe a property preserving encryption scheme for testing equality over 
message space {0, 1}. 

1. Setup(l A ): Set SK = £, where t E# {0, 1}. 

2. Encrypt (SK,m): CT = t (B m. 

3. Decrypt (SK,CT): m' = CT (B t. 

4. Test(CTi, CT 2 ): Return 1 if and only if CT X = CT 2 . 

It is well-known that as a symmetric key encryption scheme the above con- 
struction (or any deterministic encryption scheme) is not FtG secure in the sense 
of [4] but it is as a PPEnc as the following claim shows. 

Claim 2. The above construction is an FtG secure PPEnc for one-bit messages. 


1 Here and afterwards we often focus on PPTag schemes as the problem of construct- 
ing a PPEnc is essentially reduced to the problem of constructing a PPTag scheme 
(see Remark 1). 
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Proof. The key idea is that an FtG adversary A is restricted by the equality 
pattern. If A makes the challenge query as (0, 1) or (1,0) then s/he cannot make 
any encryption oracle query. Hence, the one-time pad ensures the challenge bit 
is information theoretically hidden from A. On the other hand, if the challenge 
query is of the form (0,0) or (1, 1) then there is no non-trivial information for 
A to learn either from the encryption queries or from the challenge. □ 

The above result further leads us to the following interesting consequence. Let 
E : } C x {0,1} — > {Co, Ci} be a deterministic encryption scheme. 

Claim 3. If E is FtG secure PPEnc scheme for equality then it is LoR secure. 

Proof. Let A be a valid LoR adversary for E. We will construct a valid FtG 
adversary B for E, which is playing the FtG game with its own challenger C by 
internally running A. 

Observe that A has to respect the equality pattern and hence can only 
make queries from the following disjoint sets: Si = {(0, 0), (1, 1)} and 
S 2 = {(0, 1), (1, 0)}. If A makes queries from the set 5}, then FtG — » LoR 
holds trivially. 

Now let us analyze the case when A makes queries from S 2 = {(0,1), (1,0)}. 
Let us, without loss of generality, assume that A’s first query is (0, 1). B sets the 
same message (0, 1) as its own FtG challenge query, forwards it to C. In response 
C provides a challenge ciphertext C& to S, b E {0, 1} by encrypting (3 £r {0, 1} 
using the encryption function E as per the rule of the FtG game. B forwards the 
same C& to A. Note that by the definition of FtG security, B cannot make any 
other query to C. However, if A repeats the same query (0, 1), then B simply 
forwards the same ciphertext C&. If A queries the other valid message pair (1,0), 
then B returns ciphertext Ci_&. When A outputs a bit as its guess and halts, 
then B outputs the same bit to C and halts. 

The simulation of *4’s environment by B is perfect. In fact, after the first 
query, A can on its own generate the response for all other queries it is going 
to make. Now the FtG security of E ensures that the encryption of 1 is indistin- 
guishable from the encryption of 0. Hence, the advantage of B is same as that 
of A and the two notions actually collapse. □ 

As a consequence we note that the one-time pad construction of PPEnc achieves 
LoR security. However, it is well-known that the same is not even FtG secure as 
standard symmetric key encryption scheme. Thus there exists binary property 
preserving encryption scheme secure in the strong LoR sense of property preserv- 
ing encryption but does not even achieve FtG security as a standard symmetric 
key encryption scheme. 

Based on our previous observations we suggest the following direct construc- 
tion of LoR secure PPEnc for equality testing on M = {0, l} n . A PPTag can be 
obtained by dropping the Decrypt algorithm from the description. 2 

2 Similar construction for testing equality in the context of authenticated encryption 
and searchable encryption schemes was suggested earlier by Rogaway-Shrimpton [31] 
and Amanatidis et al. [2]. Their constructions used deterministic MAC which is 
modeled as a PRF. 
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Property Preserving Encryption for Equality . We describe a scheme 77 
to test for equality of strings of length n. 3 Let {P} n be a pseudo-random 
permutation (PRP) family and an element F G {P} n is defined as F : 
{0,l} n x {0,l} n — >{0,l} n . 

1. Setup(l A ): Set a random n-bit binary string K as the secret key SK. 

2. Encrypt(P77, m): CT = F K (m). 

3. Decrypt^, CT): Return F^(CT). 

4. Test(CTi, CT 2 ): Return 1 if and only if CT X = CT 2 . 

Claim 4. If the underlying PRP family is secure, then PI is LoR secure. 

Proof. (Sketch) The claim is established through a simple hybrid argument. Let 
the adversary A for the LoR game set (rriQ 1? m* 1 ), . . . , (rag t , m\ t ) as challenges. 
We claim that the games Gameo : m q 1? . . . , t and Gamei : m\ x , . . . , m\ t are 
indistinguishable. We note that, by the security of the PRP, the Gameo is indis- 
tinguishable from a game where the challenger computes the response from a 
random permutation. Similarly, challenges output in Gamei will be indistinguish- 
able from the output of a random permutation. □ 


3.3 Separation Between FtG and LoR Notions for Equality 

After establishing the existence of natural PPEnc/PPTag scheme for equality 
testing satisfying LoR security (and, hence, FtG security), we now generalize the 
result of [29, Theorem 4.1] to show that the separation holds for the equality 
property and need not necessarily be restricted to small number of equivalence 
classes. Let A4 be the message space. Suppose z = |~log 2 |A4|] so that every ele- 
ment m G M can be represented by a bit string of length z. Note that 2 (and not 
\M\) is a polynomial in the security parameter. Let 77 = (Setup, Encrypt, Test) 
be any FtG secure PPTag scheme for equality. From this scheme we construct 
another scheme 77' = (Setup*, Encrypt 7 , Test 7 ) for realizing the same property. 
The construction uses a PRF family T : {0, 1}* x {0, 1} Z — > {0, 1 } Z . 4 * * 

1. Setup*(l A ): Calls Setup of 77 to obtain (PP, SK) and chooses k Gn {0, 1} K (as 
the key for the PRF). The algorithm outputs PP as the public parameters 
for II' and sets the secret key as SK' = (SK, k). 

2. Encrypt* (PP, SK', m): While encrypting m G M, the encryption algorithm of 
77 is used to obtain ct = Encrypt(PP, SK , m). Then choose a bit b G# {0, 1}. 
The ciphertext of II' is computed as 


CT = 


( 1 ct,b,F k (m )), if b = 0, 

(ct, b, Fk(m) 0 m), otherwise. 


3 For the case of PPTag there is no need to decrypt and hence the construction can be 
extended to arbitrary length messages by the use of a CRHF H with n-bit digests. 

4 The PRF can be replaced by a set of \M\ random bit strings when \M\ is small 

(i.e., polynomial in the security parameter). On the other hand, for arbitrary length 

messages one can use a collision resistant hash function (CRHF) H to first map the 

message to a digest of z - bit and then apply the PRF on the digest. 
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3. Test'(CTi,CT 2 ,PP): Given CT 1 = (ct u b u h) and CT 2 = {ct 2 ,b 2l t 2 ), the 
algorithm outputs Test(cp, ct 2 , PP). 

The following two lemma generalize the result of [29] and together establish 
that the separation result for FtG and LoR holds for equality property. We provide 
the proofs in Appendix A. 

Lemma 1. If the scheme II is FtG secure and T is a secure PRF then FI ' 
constructed as above is also FtG secure. In particular, tu’ < £n + 2er where 
ex denotes the advantage in the corresponding security game for the primitive 

X e {IL F. II'}. 

Lemma 2. There is an LoR adversary for the scheme II' with non-negligible 
advantage. 

Remark 2. We point out an interesting consequence of the above separation 
result. Shen- Shi- Waters [32] proposed two security notions, the single challenge 
and full challenge security for predicate private symmetric encryption (see [32] 
for the definitions of security). The strategy outlined in Lemmas 1 and 2 in the 
context of PPTag can be adapted to establish a similar separation between single 
challenge and full challenge security of predicate private encryption. Suppose we 
are given a single challenge secure predicate private scheme for equality, called 
F. From that we construct another scheme F' where the only changes are in 
the Setup and Encrypt as described in the context of II' above. In particular, 
the encryption algorithm of F' outputs a ciphertext of F together with either 
(6, Fk(m)) or (6, F^{m) 0 m) depending upon whether 6 = 0 or b = 1. A similar 
argument as in the case of PPTag above shows that F' is single challenge secure 
but not full secure. 

Hierarchy Among FtG Classes. We briefly comment on the separation result 
for the hierarchy among FtG classes given in [29]. The reader may refer to the 
full version [20] for further details. The equality property over small message 
space is used to establish the result. We start with a scheme 77 which is FtG 77 
secure and derive a scheme FI’ which is not FtG r?+1 secure. For each message m 
the Setup algorithm of FI' stores a set of random bit strings {t m q, . . . as 

part of secret key. Encryption algorithm of FI' chooses b E# {1, . . . , 77 + 1} and 
returns 


FI' . Encrypt (PP, SK, m) = (77.Encrypt(PP, SK, m), 6, val), 


val = 


where 

t m ,b, if 1 < b < T] 

tm, 1 0 . . . 0 7 m, 0 0 m, if b = 77 + 1. 
The derived scheme FI' is not FtG ?7+1 secure, but FtG 77 secure. 
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3.4 The Bottom Line 

At this point a reader may wonder what could be a plausible conclusion of our 
analysis. On one hand, a PRP is sufficient to construct LoR secure PPEnc for 
equality and the two notions of FtG and LoR collapse in such a setting. On the 
other, for the same property there is a theoretical gap between FtG and LoR 
notions of security which may or may not be the case for other properties of 
interest. In fact, in the next section we show that for orthogonality any FtG 
secure PPEnc for vectors of length 2 n gives an LoR secure PPEnc for vectors of 
length n. 

It seems the only reasonable conclusion is that no conclusive evidence exists 
indicating any real world difference between the two notions of security for PPEnc 
in general. This leads us to the following open question: is there a ‘natural’ con- 
struction of a scheme for testing equality or, for that matter, any other ‘nat- 
ural’ property, which is FtG secure but not LoR secure. Resolving this question 
will shed further light into the usefulness of the hierarchy of security notions 
introduced in [29]. 

4 Orthogonality: Relation Between FtG and LoR and with 
Equality 

We show that it is possible to construct an LoR secure scheme from FtG secure 
scheme for orthogonality which provides evidence that FtG is a meaningful notion 
for property preserving encryption. Next, we show that orthogonality implies 
equality in the property preserving scenario. 


4.1 FtG 2 n implies LoR™ 

Shen, Shi and Waters showed [32, Theorem 2.8] that a single challenge secure 
symmetric key predicate-only encryption scheme for testing orthogonality of 
vectors of length 2 n may be used to construct one achieving full security for n 
length vectors. Inspired by their technique we derive a similar result for property 
preserving orthogonality testing. 

Let 02 n be an FtG secure PPTag encryption scheme for testing orthogonality 
of vectors of length ‘In. We construct a PPTag scheme 0 n for testing orthog- 
onality of vectors of length n as follows. In the following we assume that the 
underlying field on which the vectors are defined does not have characteristic 2 
(this is a technical requirement in the security argument). For x = (aq, . . . ,x n ) 
and y = (yi,...,y n ), as usual x\\y := (x lt . . . ,x n ,y ly . . . ,y n ). 

1. O n ■ Setu p(l A ) : The public parameters and the secret key are the same as the 
corresponding ones of 02 n - 

2. 0 n • Encrypt (PP, SK,x): The ciphertext is 02 n * Encrypt(PP, SK, x\ \x). 

3. 0 n • Test(CTi, CT 2 , PP): The test is carried out using that of the 02 n scheme 
as O n • Test(CTi, CT 2 , PP) = 1 if and only if 0 2n • Test(CTi, CT 2 , PP) = 1. 
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Next, we show that 0 n is LoR secure. The proof proceeds via a sequence of 
hybrids. Any adversary who can distinguish two adjacent games can break the 
FtG security of & 2 n- 

Theorem 1. The scheme & 2 n is FtG secure implies the derived scheme O n is 
LoR secure. 

Proof. (Sketch) Recall that we have assumed that the underlying field on which 
the vectors are defined does not have characteristic 2. We observe that x • y m 
0 if and only if (x\\x) • (y\\y) = 0. The encoding which maps x to x\\x is used for 
proving LoR security via a hybrid argument. 

Let A be a valid LoR adversary for <9 n . The adversary A sets as challenges 
the pairs . . . , (x^^x^) to the challenger C. The challenger fixes a 

random bit b and returns encryption of x£\ 1 < i < q. The adversary outputs 
a bit b' at the end of the game and wins if b = b' . 

We prove that the distributions of the ciphertexts of the sequence of mes- 
sages (xq 1 ), x< o\ • • • , x o^) an d \ • • • , x ^) are indistinguishable. That is, 

the adversary A cannot distinguish the games Go and Gi of Table 1. The proof 
proceeds via a sequence of hybrid games. We tabulate the sequence of hybrids 
in Table 1. In Gb, the value a is chosen at random from the underlying field. We 
mention that a sequence of intermediate games is defined between two consecu- 
tive games for proving indistinguishability, where only one ciphertext is changed. 
One such sequence between Ga and Gb is given in Table 1. 


Table 1. Left: sequence of hybrids Go through Qi\ right: intermediate games between 
G a and Gb 


Go ■■ 

77 

X 0 1 

l r 7) 

70 , . . 

7) || 7) 
•j^o 1 7o 


Ga 

rvr r 

• • L 0 

Iio,..., 

77 


Gb 

7VTT 

• x o 

Hcuc^, 

. . . , Xq ^ \\ax 

,7) 

Gc 

: 0 q^, . . 

. , 0 


Gd 

7^ 

\\axP, 

, . . . , X{ \\QiX 

.7) 

Gi ■■ 

x[ 1} 

i r (i) 

\ x i , . . 

T 7) | | r 7) 

• i x i 1 l x i 



Ga : Xq 1 10, Xq j 1 0, • • • , Xq 1 1 0 

c, * , ■ 77iu77 od 2 )|in TPi) 


Ga, i 

rioTj 

• x o 1 

i 77 

\ax\ . 

So 


i x o 

Iio 


Ga, 2 

• x o 1 

1 77 

\ax\ . 

So 

1 1 Q!x[ 2 ^ . 

So 

no,.. 

777 


77 i 1^,77 


7 TT\ 77 


Gb 


We first argue that Go and Ga are indistinguishable. Consider an intermediate 
game, called Go,i, defined as x^ 1 10, \ \x^\ . . . , | {x^. 

Notice that this game differs from Go only in the first component. We claim 
that Go and Go,i are indistinguishable. For, suppose A can distinguish them. 
Setting as challenge messages and querying the rest of the 

elements, A can be used to construct a valid FtG adversary for 02 n - We proceed 
by defining a sequence of games where any two consecutive games vary exactly at 
one component. Similar argument would show that Gb and Gc are indistinguish- 
able. The games Gc and Gd too may similarly be shown to be indistinguishable. 
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Recall that Qb was defined using a random parameter a. Even though, say 
for example (xq^HO) * (a^llO) 7^ 0 holds, it may so happen that • 

(x^ I \x^) = 0. Thus, a random choice of a ensures that setting as the challenge 
(xq 1 ) 1 10, x^ I \ ax ^) and the rest of the elements as queries one gets a valid FtG 
adversary for O 2 n - This argument shows that Qa and Qb are indistinguishable. 
Similar argument shows that Qb and Q\ are indistinguishable. □ 


4.2 A Direct Test for Equality from Orthogonality 

Katz et al. [28] suggested a simple encoding to test for equality using inner 
product: create a ciphertext for X = (1 ,7) and a token for J = (—7,1). Now 
the inner product of X and J is 0 if and only if I = J. This encoding does not 
directly work for property preserving encryption as there is no separate token 
and the Test is performed only on the ciphertexts. Nevertheless, we show that 
one can construct a scheme for testing equality property, given a scheme for 
testing orthogonality of vectors. The new scheme inherits the same security as 
that of the underlying orthogonality testing scheme. Note that this result is of 
theoretical interest, but of little practical value as we already have much more 
efficient scheme for testing equality. 

The setting is as follows. Let the message space be F q , where the finite field 
is assumed to contain i = >/~T- Examples of fields which contain i are F 2 n; F p , 
where p = 1 (mod 4); or extensions of the form F q which contain i. The square 
root of —1 may be given explicitly or may be computed using Tonelli- Shanks 
algorithm [3, Chapter 7]. 

We encode any x G ¥ q as a vector in F^, where the encoding is given by 
x v x = (x 2 + l, ix 2 , ix , ix, i ) (in characteristic 2 fields m v m = (m+l, m, 1)). 
The mapping m ^ v m is one-to-one. Observe that, elements x and y are equal if 
and only if v x • v y = 0. We now describe a scheme XI' for testing equality, given 
a scheme 77 for testing orthogonality of vectors of length 5 over F q . 

1. Setup(l A ): The public parameters and secret key for XL' are those of 77. 

2. Encrypt(PP, SK, m): While encrypting m E F g , the encryption algorithm 
first computes the encoding v m corresponding to m. Then the ciphertext 
corresponding to m is CT = 77.Encrypt(PP, P77, v m ). 

3. Test(CTi, CT 2 , PP): Same as that of 77. 

Lemma 3. If XI is FtG (respectively LoR^ secure then so is XI' , correspondingly. 

Proof. We describe the FtG case as the LoR case may be similarly handled. 
Suppose 77' is not FtG secure, with An' a valid adversary. We construct A77, an 
FtG adversary for scheme 77, which internally runs An '• Whenever An' makes 
an encryption query m, the adversary An forwards v m to the challenger P77. 
On receiving the ciphertext, it forwards it to An '• When An' sets (ra^raj) as 
challenge, the adversary An forwards (u m *,v m *) to the challenger. On receiving 
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the encryption of one of the two vectors An forwards it to Aw • The other queries 
made by Aw may be handled similarly. When Aw outputs a bit b' and halts, 
so does An • This is a perfect simulation and An wins with the same advantage 


as that of An' • 


□ 


5 Cryptanalysis of Pandey and Rouselakis Construction 

The only construction proposed in [29] is a PPTag scheme for testing orthogonal- 
ity of two vectors over a finite field. The proposed scheme works in the composite 
order bilinear pairing setting. It is claimed without proof in [29, Theorem 5.1] 
that the scheme achieves LoR security in the generic group model with a precise 
bound on the adversarial advantage. 

We identify an inherent symmetry in the construction that is required for 
the public Test algorithm. The same symmetry allows the adversary to con- 
struct ‘pseudo-ciphertext’ for many messages from a valid ciphertext of a known 
message. Suitably manipulated pseudo-ciphertext can be exploited by the adver- 
sary to win the indistinguishability game with overwhelming probability. Thus 
the scheme is not even selective FtG secure. However, the properties of pseudo- 
ciphertexts allow an adversary to go even further. We show that, after making 
a single query, an adversary can gain non-trivial information about the underly- 
ing message vector given any valid ciphertext. In particular, the adversary can 
choose any vector and then check whether the unknown message is orthogonal 
to it or not. This effectively negates the main motivation of using the symmetric 
key setting for property preserving encryption. 

5.1 Pandey and Rouselakis Construction 

We recall the scheme of [29] for testing orthogonality of two vectors defined over 
a prime field F p , referred to as PR scheme hereafter. 

1. Setup(l A ,n): Pick two distinct primes p and q uniformly at random in the 
range ( 2 A_ 1 , 2 A ) where A is the security parameter. Let G and G t be two 
groups of order N = pq such that there is an efficiently computable bilinear 
map e : G x G — > G t- Select a vector ( 71 , ... , q n ) E 7L q such that Yn=i 7 f = 
S 2 (mod q). Let go (resp. gi) be a generator of a subgroup of order p (resp. 
q) of G. Set the message space as M = |J{ 0 }) n . Set 


PP= (n,7V,G,G T ,e), SK = {g 0 ,g u { 7 < }?=!,*>. 


2 . Encrypt (PP, SK, M): On input a message M = (mi, . . . , m n ), select two ran- 
dom elements </> and from Z^v- The ciphertext is computed as 



3. Test(CT(b, CT^ 2 \ PP): When two ciphertexts CT^ = {ct^\ {ct^ }^ =1 ) and 
ct( 2 ) = (< 4 2 ) ,{c 4 2 ) }r=i) are input, the algorithm outputs 1 if and only if: 


n 



i= 1 
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Correctness ensures that Test outputs 1 only when the underlying messages 
are orthogonal, except with a negligible probability. 


5.2 A Valid FtG Adversary 

Notice that the construction ensures that the quadratic form relation 7i + yf + 
... + 7^ = S 2 (mod q) is formed in the exponent for one subgroup element of G t 
while the inner product of the two message vectors is computed in the exponent 
of the other. However, the above equality implies that 71(71 +72) +72(72 — 71) + 
T3 + • • • + 7 n = S 2 mod q also holds. 

Given a ciphertext for some message x = (aq, . . . , x n ), say (co, ci, C2, . . . , c n ), 
the tuple W = (co,<q • C2, C2/C1, C3, . . . , c n ) may be computed. We can hence 
easily see that the tuple W may be used in the Test algorithm in place of a valid 
ciphertext of x' = (aq + £2,^2 — x u x 3i • • • i x n)- The advantage is that, even 
though the adversary is forbidden to query x' in the security game, s/he may 
still obtain a ciphertext of x if it is a valid query, and then, compute and use W 
for testing for orthogonality to x' . 

Many such relations among the secret key tuple (71, . . . , y n ) exist that are 
equal to S 2 . We give more such examples in Lemma 4 . But, this observation 
motivates us to define the notion of pseudo -ciphertext. 

Definition 6. A pseudo -ciphertext for PR scheme, associated with a valid mes- 
sage z, is an element W z E G n+1 such that Test (CT X , W z , PP ) = 1 if and only if 
Test (CT X ,CT Z ,PP) = 1, except with negligible probability, where CT X and CT Z 
are properly formed ciphertexts for x and z respectively. 

Next, we prove that [ 29 ] scheme is not FtG secure. 

Proposition 1. The PPTag scheme proposed in [29] for testing orthogonality is 
not even selective FtG secure. 

Proof. One can construct a valid selective FtG adversary for the n = 2 case as 
follows. The adversary sets ( 0 , 1 ) and ( 1 , 0 ) as challenges. Then s/he queries ( 1 , 1 ) 
and forms a pseudo-ciphertext for ( 2 , 0 ). Using that pseudo-ciphertext adversary 
can trivially win the indistinguishability game. 

Now consider the case where n > 3 . The claim is established in terms of the 
following attack game between the adversary ( A ) and the challenger (S). 

(i) A outputs a pair of n- dimensional vectors as the challenge messages 

where n <C N. The challenges are of the form = (mi, mo, 1 , . . . , 1 ) and 
/i* = (mi, mi, 1, , 1), where mi 7^ mo are from 7L* N . 

(ii) A receives the public parameter PP from challenger. 

(iii) A queries Q = ((mi +mo)/2, (mo — mi)/2, 1, . . . , 1, — (n — 3 )). Observe that 
Q is not orthogonal to either of the challenge messages pL^ and pi\ and hence, is 
a valid query. S responds with CTq , which is equal to 

( n^ S ++i+m 0 )/2J 7 i 0(m o -mi)/2 ^72 J> ,+73 J> Aln-i -(n- 3 U 

^1 + o 9i > 9o 9i 5 9o 9i 5 • • • 7 9o 9i + o 9 1 J 
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for some Er Zn- Given CTq, A takes the product and ratio of the third 
and second components of the ciphertext to obtain respectively g™ 0( ^ g^ 7l+72 ) 
and g~ rni< f > g^ 2 ~' lfl \ ^4 now computes the pseudo -ciphertext (Definition 6) Wq> 
for Q' = (mo, — m i, — (n — 3)) as 


( ^8 m 0 </> P(7 i+72) 

\9 i 5 9 o 9 i 


-mi (/) ^( 72 - 71 ) n <t> n ^l3 
^y 0 vi 5 i/ovi 5 


,50V 7 "- 1 


-(n-3)0 T/>7n\ 

5i/o y\ )' 


Note that the message vector Q' is orthogonal to [ 1 q but not to fi\. 

(iv) A now asks for the challenge ciphertext. Suppose that S responds with an 
encryption for 


_ ( rA8 mi0 71 ^ n m b 4> 72 b „</> 7sb J> „7nb\ 

b — l 9i 1 9 0 9 1 i9 0 9 1 5 9o 9i 1 ' ' ' 9o9i ) 1 


where b Er {0, 1} and 0, ip £r are as chosen by S. 

(v) A runs the Test algorithm on (CT^, Wq> , PP). This amounts to computing 
the following quantities: 


A = e(gf S ,gf 5 ) and 

E> _ „ ( n rn o $ fA ( 71 + 72 ) , -m x <\) b(72~7i) 72 W 

n — e \9o 9 1 ?t/o 9i ) ’ e \9 0 vi i9o 9i )' 

n— 1 

II e(5oV 7 %5oVb • e(«7o" (n - 3) V 7 75ob7 n b- 

i=3 

If A = B then *4 outputs b' = 0, otherwise *4 outputs b' = 1. 

We see that A — B implies 6 = 0, except with negligible probability. Hence, 
the adversary wins the selective FtG game with overwhelming probability of 
success. □ 

Remark 3. We give yet another attack on the scheme for even n. Let x = 
(xi, . . . , x n ) be any valid message. Observe that both 

= 7l(7l + 72) + 72(72 - 7l) + • • • + 7n-l(7n-l + In) + 7n(7n “ 7n-l), 

S 2 = 7l(7l - 72) + 72(72 + 7l) + ••• + 7n-l(7n-l - 7 n) + 7n(7n + 7n-l) 
hold modulo q. Hence, from the ciphertext for x, pseudo-ciphertexts for both 

Ci ™(xx + X 2 ,x 2 -xi,..., x n -i H- x n , x n - x n - 1 ) and 
£2 = (xi - x 2 , x 2 + xi, . . . , x n _i - x n , x n + x n _ 1) 

can be formed. Note that neither £1 nor £ 2 is orthogonal to x, while Ci is orthog- 
onal to £ 2 - Thus, for example, after setting (£ 1 , x) as the challenge pair, querying 
x and computing pseudo-ciphertext for £ 2 ? the adversary can win the FtG game. 
A similar attack may also be worked out for odd n. 

Remark 4- It would have been illustrating to see where exactly the proof of 
[29, Theorem 5.1] fails. Unfortunately no such proof is provided by the authors. 
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5.3 Insecurity Beyond Indistinguishability 

Recall that in the ciphertext of PR scheme described in Sect. 5 . 1 , the message 
components reside in the exponent and even the party who possesses the secret 
key does not have the ability to decrypt. Thus it is not reasonable to expect that 
one can attack the scheme in the sense of message recovery for high min-entropy 
messages. Our next attack demonstrates that an adversary is still capable of 
extracting significant amount of information. This will lead to a total break of 
the scheme when the messages come from a smaller domain, which could be the 
case in applications dealing with, for example, certain types of streaming data 
as envisaged in [ 29 ]. 

We assume that the adversary is allowed to make just one query and is given 
a valid ciphertext as response. We show how the adversary can process the given 
ciphertext and then utilize pairing to unmask the subgroup elements containing 
the message vector of any ciphertext, by working in the target group. 


Attack for n = 2 Case. Suppose the adversary makes a query (1/2, 1/2) and 
gets the ciphertext (co,ci,C2) = {gf 6 , gf^gf 11 ? gt^gf 12 )- Observe that 

(co, Cl •c 2 ,c 2 /c 1 ) = (gf , 5o V (7l+72) ,5f (72 “ 7l) ) 

(c 0 ,c 1 /c 2 ,c 1 • c 2 ) = ,flf (7l “ 72) ,5oV (7l+72) ) 

are pseudo-ciphertexts (see Definition 6) for ( 1 , 0 ) and ( 0 , 1 ), respectively, which 
can be computed by the adversary. We represent the formation of the two pseudo- 
ciphertexts, respectively, via the following two matrices with the obvious inter- 
pretation: 


Suppose now the adversary gets a ciphertext for some unknown message x = 

(^1,^2) as (Co, 61,62) = {gf 6 , gt^gf 11 , gi X 2 gf 12 ) • With the pseudo-ciphertext 
for (1,0), the adversary computes 


and M2 = 


1 -1 
1 1 


e(C 1>Cl - c 2 )e(C 2 ,c 2 / Cl ) ejg^gf 1 , gjgf^) - ejg^gf 2 , gT _ :72 ~ 7l) ) 
e(0,,co) e(gf,gf) 

= e(g 0 ,go)^ Xl . 

Thus the adversary now possesses {e{go, go)^ Xl , e(#o, go)^ X2 ), after process- 
ing the pseudo-ciphertext for (0, 1) similarly. 

This trivially breaks the FtG security of PR scheme. Moreover, the adversary 
can test if x is orthogonal to any y = (2/1 , 2/2 ) of his choice by checking whether 

(e(g 0 ,go)^ Xl ) yi ■ {e(g 0 ,g 0 )^Y' = 1 . 
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The adversary may also test for relations among the message coordinates, like 
whether x\ = ax 2 for some aina testable range. If x comes from a small domain 
then one can exhaustively try for all candidate y to check whether x and y are 
orthogonal and thereby recover x with non-negligible probability. 


Attack for General n . Before describing the attack, we show that many a 
pseudo-ciphertexts can be formed from a valid ciphertext. 

Lemma 4. For 1 < i < n, let Mi = ((m^)) be an n x n matrix defined as 
follows. Define = 1, 1 <t < n. For 1 < s < n, but s ^ i 


(*) , 
m st = < 


1 , t = S 

— 1 , t = i 
0 , otherwise. 


Let CT = (co, ci, . . . , c n ) be a valid ciphertext for x = ( 27 , . . . , x n ). Define fi = 
MiX T . Define Wi = (d^\ d± \ . . . , d$) as follows. For all j , define 


,(i) j c °’ */ 3 = 0 
d\ —— \ m w 

{Tlk=i c k Jk i otherwise. 


Then W{ is a pseudo -ciphertext for 

Proof. We provide details for i = 1 the general case is similar. Observe that by 
applying M\ to x T one obtains £i = Q^ILi x h x 2 — x i, • • • , x n — xf). We also note 
that Mi (71, . . . ,y n ) T = (Z)foi7z>72-7ij--- >7n-7i)- By an easy computation: 


7i + 72(72 - 7i) + • • • + 7n(7n - 7i) = ( m od 30- 


Let (gf 6 , gQ Xl gf 71 , • • • ,gt Xn gf ln ) be a valid ciphertext for x. From this, we 
compute a pseudo-ciphertext for £1 as 

tj/ _ 7 E^i^E 7 i J)(a;2-a:i) 7(72-70 </>(a: n -a:i) ^(7n~7i)\ 

•'•'i — \g% >g 0 i/i ’i/o g % ? • • • ? ^/o vi /• 

Let a ciphertext for y = (2/1 , . . . , y n ) be given as 


CTy = (c 0 , Cl, . . . , c n ) = (gf s , gp'gp 1 , 


J>Vn 

> g 0 vi 




Suppose we run Test with CT y and Wi. It is easy to see that 

e(co,^ EK! 7 E7! )nr=2e(c i ,^ i “" l) 7 (7! “ 7l) ) 




= e{go,go 

= 1 


Cl) 


if and only if y is orthogonal to £1, except with negligible probability. 


□ 
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Corollary 1 . By querying the vector x = (1/n , . . . , 1/n), one can obtain the 
pseudo -ciphertexts for each of the unit vectors Ci = ( 0 , . . . , 0 , 1 , 0 . . . , 0 ) (1 in the 
ith place), 1 < i < n. 


In the following theorem we describe the attack for general n. 


Theorem 2 . Suppose in the proposed PR scheme of [29] the adversary is 
allowed to make one query for any message of its choice. Then, given a valid 
ciphertext for any unknown message (x\, . . . ,x n ), the adversary can extract the 
tuple of elements ( 77577 ^ Xl , . . . , 77 ^ Xn ) for some r\ belonging to the order-p sub- 
group of G t and ft E Z/v • 

Proof. Let (d 0 , di, . . . , d n ) = (gf 5 , g$' /n gf 71 , . . . , gf n gf n ) be the ciphertext 
for the queried message (1/n, . . . , 1/n). A ciphertext CT X for some unknown 
x = (x\, . . . ,x n ) is given to the adversary, where CT X = (co, c \, . . . , c n ) = 

( n <t>Xn n fln\ 

[9 1 ^9 o 9 1 >•••># o 9 1 )• 

Notice that the unit vector can be written as = M^(l/n, . . . , l/n) T . 
From Lemma 4, the adversary can compute Wi = (wq\w^\ . . . , Wn ^), a pseudo- 
ciphertext for C{ as 


Wi = 


[gf 


b(7l ~li) 

9 1 5 


5 9 1 : 


7 j) 

9o9i ; 


^(7i+i-7i) 

9 1 5 


A47n-7i) 

5 9 1 


The adversary further computes e ( c h w i^)) / e ( c o,^o^) = e ( 9 o, 9 o)^ Xi - 

In a similar fashion, the adversary obtains a tuple over the order-p subgroup of 
the target group G t as ft = (e(go, gft)^ Xl , . . . , e(g$, 9o)^ Xn ^j • The adversary 

now computes 77 := (fllLi e (^*, ^)) / e (^o, do) = e(go, 9o)^ 2 ^ n - Rewriting i? as 
powers of 77 , s/he gets !? = ( 77 ^ 351 , . . . , 77 ^ ^ n ). Hence the result. □ 


As already pointed out for the n = 2 case, the above argument shows that the 
adversary is capable of extracting a lot of information from the ciphertext of any 
unknown message vector x. Recall that the fundamental reason for having PPTag 
in symmetric setting is to prevent the adversary from being able to test whether 
a ciphertext of some unknown message satisfies a certain property and thereby 
learn some non-trivial information about the message. Given ft the adversary 
can precisely do that and thus the scheme in [29] defeats the very purpose of 
symmetric key property preserving encryption. 


6 Concluding Remarks 

In this work we perform a comprehensive (crypt) analysis of property preserv- 
ing symmetric encryption. On the definitional front, we revisit the FtG and LoR 
separation result in [29]. To do that we show equality property captures prop- 
erty P qr used in the separation results and provide a simple construction for 
equality property to demonstrate that the separation results are non- vacuous. 
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Based on the security attributes of our construction and its generalization we 
raised the pertinent question of whether the separation results actually indicate 
any real world difference between the two notions of security and argue for a 
property specific study of the security notions. Continuing further in this direc- 
tion, we see that an LoR-secure scheme may be constructed from a so-called 
weaker FtG-secure one for orthogonality. We demonstrate several attacks on the 
PPTag scheme for testing orthogonality from [29] refuting the claim that the 
scheme is provably secure. Our main attack successfully unmasks the subgroup 
elements where the message vector is mapped to and thereby points to greater 
vulnerability beyond the notion of indistinguishability. 
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A Appendix 

We first argue the separation result for polynomial size message space and use 
it to prove the general case. 


A.l Separation Result for Polynomial Size Message Space 

Let M. = {ai | 1 < i < /} be the message space and each oti can be represented 
by a z-bit string where z = |~log 2 /]. We argue the separation result FtG ^>LoR 
for equality property in the case where l is polynomial in security parameter. 
Let 77 be an FtG secure PPTag scheme for equality over AL From this scheme 
we construct another scheme U' for realizing the same property as follows. 


1. II' • Setup(l A ): The public parameters for II' are exactly those of 77. The 
secret key SK' of II' comprises of that of 77 and a set of binary strings 
{ti \ 1 < i < l}, where each ti is of length z and chosen independently and 
uniformly at random. 

2. II' • Encrypt(PP, SK, m): Suppose m = the algorithm chooses a random 
bit b and the output is defined as 


77b Encrypt (PP, SK' , m) 


(77. Encrypt (PP, SK , ra), 6, t<), if b = 0, 
(77. Encrypt (PP, SK , m), 6, U ® c^), o.w. 


3. 77 / .Test(CTi, CT 2 , PP): Same as that of 77, where only the relevant parts of 
the ciphertexts are used. 


Lemma 5. The scheme II' is not FtG secure implies 77 is not FtG secure. 
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Proof. Consider a valid FtG adversary for 77', denoted by A. We describe how an 
FtG adversary B for 77, with same advantage as that of A and which internally 
uses A, can be constructed. 


(i) . B forwards to A whatever is received from its own challenger as public 
parameters of 77 and initializes an empty table T. 

(ii) . Whenever A makes an encryption query for m = c^, 1 < i < /, B forwards 
it to the simulator of 77. On receiving ct from the simulator, B checks whether 
the same query was made earlier or not. If the query is made for the first time, 
then it chooses t G# {0, 1} Z , sets ti = t and updates the table T with 

Else, B reuses corresponding C from T. Finally B chooses a random bit b and 
forwards to A 


(ct,b,U), if b = 0, 

(■ ct , b,U © cq), if b = 1. 


(iii) . After a certain number of encryption queries A outputs the challenge 

Two cases arise with respect to the challenges, which we describe 

below. 

Case 1: The challenge messages and mn\ are equal. 

Case 2: The challenge messages m and m\ are different. In this case, the 
adversary cannot make encryption query for these two messages. 

B forwards to the simulator of 77 and gets ct*. If the challenge 

messages are equal (Case 1), then (ct*,b,val) may be computed by B in the 
same way as it responses to the encryption queries. If the challenge messages 
are different (Case 2), then none of and m\ have been queried previously. 
B returns (ct*,6,t*), where b Er {0,1} and t* G# {0, 1} Z . Let aq G {mo,mJ} 
be the unknown message chosen by the simulator of 77. The strategy adopted 
by B gives a perfect simulation. This is because if b = 0 then t* can be set as tj 
whereas for b = 1, t* can be set as tj 0 aq. 

(iv) . B follows the same strategy of step (ii) above to answer all the subsequent 
encryption queries of A. 

(v) . When A outputs a bit b' and halts, so does B. 

Notice that all the ciphertexts which B computes for forwarding to A are 
properly distributed. B is a polynomial time algorithm and provides a perfect 
simulation. Hence, advantage of B is equal to that of A. □ 


Lemma 6. There is an LoR adversary for the scheme 77' with non-negligible 
advantage. 


Proof. A valid LoR adversary sets as u challenges the same pair of the form 
(mo, mi), with mo ^ m\. Equality pattern is clearly preserved between the left 
and right sequences. If the challenger outputs two ciphertexts for which the 
b - values are distinct, then the adversary can immediately distinguish the two 
sequences. The advantage will be 1 — 2~ u+1 . □ 


The strategy outlined in the above proof can be used to prove Lemma 2. 
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A. 2 Proof of Lemma 1 

Recall that in the FtG game A makes a polynomial number of encryption oracle 
query 1 < i < q, and a single challenge query (mo,m*) maintaining the 
equality pattern. Two cases arise depending upon whether the challenge mes- 
sages mg an d m i are equal or not. If ttt-q = m\ then it is easy to see that any 
advantage of A against II' translates into the same advantage against 77. Hence, 
we consider the case when ttt-q ^ m\. Note that in this case none of the queries 
to the encryption oracle rrii is equal to m£, for b G {0, 1}. Otherwise, the equality 
pattern of the two sequences will be different allowing A to trivially distinguish. 

Let Game 0 correspond to the queries (mi, . . . , m*, ttt-q, rn^+i, . . . , m q ) while 
Gamei to queries made by the adversary. Sup- 

pose A can distinguish whether it is playing Gameo or Gamei with a non- 
negligible advantage e#/. The proof will proceed through a hybrid argument. 
Given an adversary A against II' we construct a series of four games and then 
show that if A can distinguish between any two consecutive games then we can 
construct either a PRF adversary against T or an FtG adversary against 77. 

Gameo The challenger runs the Setup algorithm of II' and gives the PP to 
A and keeps the secret key SK' = (SK,k) to itself. The challenger computes 
the ciphertext corresponding to (mi, . . . , m*, m^, m* + i, . . . , m q ) using SK' as per 
the encryption algorithm of II' and give them to A. 

Game^ The challenger runs the Setup algorithm of 77 and gives the PP to 
A and keeps the secret key SK of 77 to itself. Note that the challenger does 
not generate the PRF key fc; instead it will maintain a table T = (xi,yi) where 
xi and yi are two z-bit strings. The first entry in each row of T corresponds to 
the messages queried by A while the second entry is a random bit-string. The 
table is initially empty. Whenever A makes an encryption query for a message x, 
the challenger first checks whether there is a corresponding entry in T. If not, it 
chooses a random z-bit string y and enters (x, y) in the table T sorted according 
to the first entry. A makes encryption queries for (mi, . . . , m^m^m^i, . . . , mn q ). 
To answer the query of A for a message, say x, the challenger computes the 
ciphertext of 77 on x and then uses the corresponding random string y from 
the entry (x,y) in T to create a ciphertext of 77'. Note that A makes at most q 
encryption oracle queries and a single challenge query. So the size of T is 0(q) 
and hence the challenger can consistently respond to all the queries of A. 

Claim 5. If A can decide with a non-negligible advantage whether it is playing 
Gameo or Game^ then we can construct a PRF distinguisher with the same 
advantage. 

Recall that in the PRF security game we are provided with an oracle which 
is either a function from the PRF family or a random function. In the former 
case the challenger will be playing Gameo while in the latter case it’ll be playing 
Game^. Hence, any advantage of A in distinguishing between the two games 
translate into the same advantage of the challenger in breaking the PRF security. 

Gamei (resp. Game#) will be identical to Gameo (resp. Game^) except the fact 
that A now queries with (mi, . . . , m# m*, m*+i, . . . , m q ). An identical argument 
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as in the claim above establishes that any advantage of A in deciding whether 
it is playing Gamei or Game# translates into the same PRF advantage for the 
challenger. 

Note that the only difference in Game^ and Game# is in the challenge cipher- 
text (corresponding to tUq and m\). The challenge is computed by calling the 
encryption algorithm of 77 and appending either a random bit string or a 
one-time encryption of ml (using that random string). Hence, an adversary 
distinguishing between Game^ and Game# can be converted into an adversary 
breaking the FtG security of 77. As there are only polynomial many queries, this 
case is the same as the one where there are only small (polynomial in A) number 
of messages. This case can be easily handled by using random strings. We have 
already given the analysis in the proof of Lemma 5. 
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Abstract. We study two open problems proposed by Wagner in his 
seminal work on the generalized birthday problem. First, with the use 
of multicollisions, we improve Wagner’s k - tree algorithm that solves the 
generalized birthday problem for the cases when k is not a power of two. 
The new k - tree only slightly outperforms Wagner’s k- tree. However, in 
some applications this suffices, and as a proof of concept, we apply the 
new 3-tree algorithm to slightly reduce the security of two CAESAR 
proposals. Next, with the use of multiple collisions based on Heilman’s 
table, we give improvements to the best known time-memory tradeoffs for 
the k- tree. As a result, we obtain the a new tradeoff curve T 2 • M lg/c_1 = 
k ■ N. For instance, when k — 4, the tradeoff has the form T 2 M = 4 • N. 


Keywords: Generalized birthday problem • k - list problem • k - tree algo- 
rithm • Time-memory tradeoff 


1 Introduction 

Arguably, the most popular problem in private key cryptography is the collision 
search problem. It appears frequently not only in its classical usage, e.g. finding 
collisions for hash functions, but also as an intermediate subproblem of a wider 
cryptographic problem. The collision search has been widely studied and well 
understood. Besides this problem, and along with the search of multicollisions 
and multiple collisions, perhaps the next most popular is the generalized birthday 
problem (GBP). 

The GBP is defined as follows: given k lists of random elements, choose 
a single element in each list, such that all the chosen elements sum up to a 
predefined value. Wagner is the first to investigate the GBP for all values of 
k and as an independent problem. In his seminal paper [31], he proposes an 
algorithm to solve GBP for all values of k and shows wide variety of applications 
ranging from blind signatures, to incremental hashing, low weight parity checks, 
and cryptanalysis of various hash functions. 

Prior to Wagner, GBP problem has been mostly studied in the context of 
its application and only for a concrete number of lists (usually four lists, i.e. 
k m 4). Schroeppel and Shamir [28] find all solutions to the 4-list problem. 
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Bernstein [4] uses similar algorithm to enumerate all solutions to a particular 
equation. Boneh, Joux and Nguyen [10] use Schroeppel and Shamir’s algorithm 
for solving integer knapsacks as well as Bleichenbacher [8] in his attack on DSA. 
Chose, Joux, and Mitton [11] use it to speed up search for parity checks for 
stream cipher cryptanalysis. Joux and Lercier [19] use related ideas in point- 
counting algorithms for elliptic curves. Blum, Kalai, and Wasserman [9] apply 
it to find the first known subexponential algorithm for the learning parity with 
noise problem. Ajtai, Kumar, and Sivakumar findings [1] base on Blum, Kalai, 
and Wasserman’s algorithm as a subroutine to speed up the shortest lattice 
vector problem. 

To solve the fc-list problem, Wagner proposes a so-called fc- tree algorithm. 
In a nutshell, the k - tree is a divide and conquer approach and at each step it 
operates on only two lists. The step operations are based on a simple collision 
search. When the k lists are composed of n-bit words, Wagner’s k - tree algorithm 
solves the GBP in 0(k • 2 Lig^J+i ) time and memory and requires lists of around 
2 LigfcJ+i elements 1 . 

Even though the GBP has been shown to be very important to many prob- 
lems in cryptography, more than a decade after its publication neither signifi- 
cant improvement to the k - tree algorithm nor other dedicated algorithms have 
emerged. However, moderate improvements and refinements have been pub- 
lished. As one of the most important, we single out the extended fc-tree algorithm 
by Minder and Sinclair [21] that provides solution to GBP when the lists have 
smaller sizes and the time- memory tradeoffs by Bernstein et al. [5,6]. 


Our Contribution. Wagner points out a few open problems of the GBP and 
of the k - tree algorithm. Two of these problems, namely, improving the efficiency 
of k - tree when k is not a power of two and memory reduction of the k- tree, are 
in fact the main research topics of our paper. 

The k - tree algorithm discards part of the lists when k is not a power of two 
(note how the complexity of k - tree takes lower bound of lgfc). For instance, 7- 
tree works only with 4 lists, while the remaining 3 lists are not processed. Our 
first improvement to the fc-tree is to work with the discarded lists (we call them 
passive lists) by creating multicollisions from the lists. From each of the passive 
lists we create a multicollision set of values that coincide on certain l bits, where 
l < n. Then, we produce several solutions with the fc-tree from the other (active) 
lists, and for the same l bits. Finally, the remaining n — l bits are absorbed by 
combining the multicollisions from the passive lists, and the solutions from the 
active lists. The advantage of our approach over the classical fc-tree is limited 
by the size of the multicollision sets, which in turn is bounded by the value 
of n. The speed-up factor can be approximated as a • n c / lg(6 • n), where a, 5, c 
are constants that depend on fc. The speed-up is sufficient to break the 0(2 2 ) 
complexity bound for the 3-list problem and to show that in applications this 
can matter. As an example, we show a security reduction of two authenticated 


1 


Note, we use lg for log 2 . 
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encryption CAESAR [3] proposals, Deoxys [16] and KIASU [18], based on the 
latest results of Nandi [22]. He shows that a forgery attack for COPA based 
candidates can be reduced to the 3-list problem. We apply our improved 3-tree 
algorithm to this problem and reduce the security bound of the candidates by 2 
bits. 

Our second contribution are time-memory tradeoffs for the k - tree algorithm. 
This research topic has been investigated by Bernstein et al. Their best tradeoffs 
are described with the curves T M lg k = fc • TV and T 2 • M lg k ~ 2 = • TV, where M 
and T, are the memory and time complexity, respectively, and TV is the size of the 
space of elements. To achieve a better tradeoff, we play around with the idea of 
producing multiple collisions in a memory constrained environment with the use 
of Heilman’s tables 2 . It allows us to significantly reduce the memory complexity 
of the first level of the k - tree algorithm and to achieve better tradeoffs. As a 
result, we obtain the tradeoff T 2 M lg/c_1 = fc • TV. This translates to T 2 M = 4 -TV 
for k = 4, and T 2 M 2 = 8 • TV for k = 8 (cf. TM 2 = 4 • TV and TM 3 = 8 • TV 
curves of Bernstein et al.). As illustrated further in Fig. 6, for a given amount 
of memory, the new tradeoff always leads to a lower time complexity than the 
previous tradeoffs. The improvement of the tradeoff can be seen on the case of 
generalized birthday problem for the hash function SHA-160 and k = 8. Our new 
tradeoff requires around 2 50 SHA-1 computations and 2 30 memory on 8 cores 
(with the use of van Oorschot and Wiener’s parallel collisions search [30]), while 
with the same memory, the old tradeoff needs around 2 65 SHA-1 computations. 

2 The Generalized Birthday Problem 

Wagner introduced the generalized birthday problem (GBP) as multidimensional 
generalization of the birthday problem. GBP is also called a fc-list problem, and 
is formalized as follows: 

Problem 1. Given k lists L\,,..,L k of elements drawn uniformly and inde- 
pendently at random from {0, l} n , find x\ G Li,..,,x k G L k such that 
X\ ® x 2 ® . . . 0 X k = 0 . 

Obviously, if \Li\ x \L 2 \ x ... x \L k \ > 2 n , then with a high probability the 
solution to the problem exists. The real challenge, however, is to find it efficiently. 

When all the lists have a minimal size, i.e. \Li\ = 2fc , efficient algorithms to 
the fc-list problem are known only for the cases when k = 2, and k > n. The 
former is due to the collisions search algorithm, i.e. 2-list problem is equivalent to 
finding collisions thus it can be solved in 2 n / 2 . The latter is due to the Bellare and 
Micciancio result [2] which states that such problem can be solved by Gaussian 
elimination in 0(n 3 + kn). A trivial algorithm is known for the fc-list when 
2 < fc < n. The algorithm first creates two larger lists L\,L 2 , where L\ = 
{X\X = Xi © . . . © x k/2 ,Xi G Li}, L 2 = {Y\Y = x k/2+1 © . . . © x k , x { G TJ and 


2 Joux and Lucks [20] use this technique to generate multiple collisions, which later 

lead to multicollisions. 
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Li L 2 L 3 L 4 

Fig. 1 . Wagner’s 4-tree algorithm. 


subsequently it finds a collision between the two lists. The size of the lists is 2 ? 
thus the time complexity of the algorithm is 0 (2 % ). 

Wagner proposed the k-tree algorithm that solves GBP (fc-list problem) faster 
under the assumption that the list sizes are larger. Further we describe the case 
when km 4 , refer to Fig. 1. Let us define S ex T as a list of elements common 
to both S and T, and let lowi(x) stand for the l least significant bits of x. 
Furthermore, let us define S tX/ T as a set that contains all pairs from S xT that 
agree in their l least significant bits (the xor on the least significant bits is zero). 
Assume Li, L2, T3, L4 are four lists, each containing 2 l elements (l will be defined 
further). First we create a list L12 of values x\ 0 £ 2 , where x\ £ L 1^X2 £ T2, 
such that lowi(x\ 0 X2) = 0 . Similarly, we create a list L34 of values x% 0 £4, 
where X3 £ £3,^4 £ T4, such that lowi(x 3 0 X4) = 0 . Finally, we search for a 
collisions between Li 2 and L34. It is easy to see that such a collision reveals the 
required solution, i.e. x\ 0 X2 0 £3 0 X4 = 0. 

The main advantage of the k- tree algorithm lies in the way the solution is 
found - at each of the two levels, only a simple collision search algorithm is used, 
and only a certain number of bits is made to fulfill the final goal (the xor is zero 
on all bits). At the first level, the lists L12, L34 contain words that have zeros on 
the l least significant bits, thus xor of any two words from the lists must have 
zeros on these bits. At the second level, the xor of the words from the two lists 
will result in zeros on the remaining n — l bits, if there are enough pairs. To get 
the sufficient number of pairs, the value of l is defined as l = n/ 3 . Then each of 
Li2,I/34 will have 2 n / 3 • 2 n / 3 /2 n / 3 = 2 n / 3 words, and thus at the second level 
there will be 2 n / 3 • 2 n / 3 = 2 2n / 3 possible xors, one of which will have zeros on 
the remaining n — n /3 = 2 n /3 bits. It is important to note that l is chosen as to 
balance the complexity of the two levels. Obviously, the total memory and the 
time complexities of the 4 -tree algorithm are 0 ( 2 n / 3 ) each. 
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The very same idea is used to tackle any fc-list problem, where k is a power 
of two. The only difference is in the choice of /, and in the number of levels. 
In general, the number of levels equals lg k , and at each level except the final, 
additional l bits are set to zero. At the final level, the remaining 2 l bits are 
zeroed. Hence, l • lg k + l = n, and thus l = n/(lg k + 1). The algorithm works in 
0(A:2 1 s fc + 1 ) time and memory and requires lists of sizes 2 1 s fc + 1 . As an example, 
let us focus on 8-list problem, i.e. we have Li, . . . ,Lg lists, lg 8 = 3 levels, and 
l = n/4. At the first level we build L12, T34, £55, £73, by combining two lists 
Li , Lj , each with 2 l = 2 n / 4 elements that have zeros in the n/4 least significant 
bits. At the second, we build L1234 and T5678 that have again 2 n / 4 elements with 
zeros in the next n/4 bits. Finally, at the third level, we find the solution that 
will have zeros on the remaining n/2 bits. 

Wagner’s algorithm works similarly when k is not a power of two. The trick 
is to make some lists passive , i.e. to choose one element from each of the passive 
lists, and then continue with the algorithm as for the case of power of two and 
the remaining lists. For instance, to solve 6-list problem for lists Li,...,L 6 , 
we take any element U5 G T5 and vq G Lq, and then solve the 4-list problem 
x\ 0 X2 ® X3 0 X4 = U5 ® U6, for the lists Li, . . . , L4. We can easily remove the 
non-zero condition U5 ® vq in the right part, by adding this value to each element 
of the list L\. Hence, the complexity of the fc-list problem for the case when k is 
not a power of two equals the complexity to the closest (and smaller) power of 

n 

two case. Thus, for any value of fc, the fc-tree algorithm works in 0(k • 2 Lig^J+i ) 
time and memory. 

3 Improved Algorithm for the 3-List Problem 

We focus on the 3-list problem and show how to improve the complexity of 
Wagner’s 3-tree algorithm. Our improvement is based on the idea of multicolli- 
sions. The technique mimics the approach developed by Nikolic et al. [24] and 
further generalized by Dinur et al. [12]. We exploit the fc-tree algorithm, but we 
also work with the passive lists and make them more active. Namely, instead of 
simply taking one element from the passive lists, we find in them partial multi- 
collisions - sets of words that share the same value on particular bits. We then 
force the active lists on these bits to have a specific value (which is xor of all the 
values of the partial multicollisions), and at the final step, merge the results of 
the active and passive lists to obtain zero on the remaining bits. Let us take a 
closer look at this idea. 

Definition 1. The set of n-bit words S = {aq, . . . , x p } forms a p-partial multi- 
collision on the s least significant bits , if low s (x 1) = low s {x 2) = . . . = low s (x p ). 

This is to say that all p words are equal on the last s bits. Note, the choice to work 
with the least bits is not crucial but is introduced to simplify the presentation. 
Given an arbitrary set, we can create a p-partial multicollision from this set, 
i.e. we can find a subset that is p-partial multicollision. The maximal value of p 
depends on the size of the initial set and will be analyzed later in the section. 
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Li L 2 L 3 


Fig. 2. Multicollision technique for k = 3 . The values in blue denote the size of the lists. 


Let us assume that we are given a 3 -list problem with lists Iq, L 2 , L3, each of 
size 2 l . If we apply the k - tree algorithm, then l should equal n/ 2 , the lists Li,L 2 
will be active, while L3 will be passive. Instead of marking L3 as passive, let us 
create a p-partial multicollision from L3 on the / least significant bits (LSB) and 
denote this set as L3 (refer to Fig. 2). Without loss of generality we can assume 
that the colliding value of the / bits is zero (if not, we xor this value to all the 
elements of the list Zq). Furthermore, with the use of the join operator, from 
L \ , L 2 we create a list L12 of all values x\ ® x 2 , where x\ G L\,x 2 £ L 2 and 
lowi(x 1 ® X2) = 0 ; obviously |Zq2| — 2 l with high probability. Finally, we use 
the join operator once again between L12 and L3, to find the required solution. 
As we have to cancel additional n — l bits, the solution will exist with a high 
probability as long as p\L 12 \ > 2 n_z , that is, p 2 2l ~ n > 1. 

The complexity of our algorithm depends on the complexity of the two join 
operators and of producing multicollisions. The join operators (which are indeed 
simple collision searching algorithms) work in 0 ( 2 l ) as in each of the cases, the 
sizes of the lists are not larger than 2 l . Furthermore, the partial multicollisions 
from |Z/3 1 = 2 } can be produced in 0 ( 2 l ) time and memory 3 * * . Hence the multi- 
collision technique solves the 3 -list problem in 0 { 2 l ) time and memory. 

Let us find the value of /. For this purpose we replace the inequality p 2 2l ~ n > 1, 
with 

p 2 2l ~ n = 1, (1) 


3 It is to initialize counters for each possible value of the colliding bits, then for each 

x G L3 increase the counter lowi{x 3). After all elements have been processed, counter 

with the highest value corresponds to the largest multicollision set. 
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and obtain 




( 2 ) 


Therefore, the complexity of our algorithm is 0(2% /y/p), hence the speed-up 
factor is yfp. Recall that p is the size of the multicollision set produced from 
the passive list L 3 - the larger the size, the greater the speed-up. Note, in the 
original Wagner’s 3-tree algorithm, one element is chosen at random from L 3 
and therefore the multicollision set consists of a single element. That is, for the 
classical 3-tree, p = 1 and the complexity is 0(2%). 

Let us examine the maximal possible value of p, i.e. the size of the p-partial 
multicollisions set on l bits produced from the set L 3 of size 2 l . Theorem 5 of [29] 
defines the number of elements in a set required to produce multicollision with 
a high probability, and by this theorem we obtain 

(p\) 1/p 2 2 ^ 1 = 2 l . (3) 


A more straightforward way that we use to find the value of p is based on 
the so-called balls-into-bins problem: m balls are thrown into m bins (the bin 
for each ball is chosen uniformly at random), and the problem is to find the 
expected maximum load, i.e. the expected number of balls in a bin that contains 
the most balls. The solution to this problem is well known and the expected 
maximum load asymptotically is: 


Our multicollision problem is an instance of the balls-into-bins problem as the 
number of elements in the passive list L 3 (the number of balls), and the size of the 
multicollision space (the number of bins) are both 2 l . Therefore, the asymptotics 
of p(l) can be evaluated as 1 ) = ®(l hi)- Finally, as l « we obtain that 

the speed-up factor yfp of our improved 3-tree (over Wagner’s 3-tree) is 
thus the complexity of our algorithm is 


° A/iy ■ <5> 

To find the actual speed-up for concrete values of n, we need to approximate 
the asymptotics of p(Z), i.e. need to find the approximate value of c in p(l) = 
c \nJ' For this purpose, we have run a series of experiments. For each value of 
l = 10 , ... , 28, we have generated 2 l random values (of bit length l) and have 
checked the maximal number of multicollisions. For each /, the experiments have 
been repeated 20 times. The outcomes of the experiments are reported in Table 1. 

Based on the experiments, the value of c can be approximated as c w 1.3. 
With such an assumption, we have computed the speed-up factor of our improved 
3-tree for various values of n - we refer the reader to Table 2. 
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Table 1 . Experimental search of number of multicollisions. 


l 

Average size 

i 

In l 

c 

10 

5.80 

4.34 

1.34 

11 

5.85 

4.59 

1.27 

12 

6.10 

4.83 

1.26 

13 

6.45 

5.07 

1.27 

14 

7.00 

5.30 

1.32 

15 

7.15 

5.54 

1.29 

16 

7.55 

5.77 

1.31 

17 

7.90 

6.00 

1.32 

18 

8.15 

6.23 

1.31 

19 

8.50 

6.45 

1.32 

20 

8.70 

6.68 

1.30 

21 

9.05 

6.89 

1.31 

22 

9.50 

7.12 

1.33 

23 

9.65 

7.34 

1.31 

24 

10.30 

7.55 

1.36 

25 

10.40 

7.77 

1.34 

26 

10.60 

7.98 

1.33 

27 

11.05 

8.19 

1.35 

28 

11.15 

8.40 

1.33 


Table 2. A comparison of the time complexities of Wagner’s 3-tree with our new 
approach. 


n 

Speed-up (y/p) 

l 

64 

3.43 

31 

128 

4.42 

62 

256 

5.82 

126 

512 

7.71 

253 


The above strategy is in line with the multicollision approach by 
Nikolic et al. used in the analysis of the lightweight block cipher LED [14]. 
The advanced approach by Dinur et ah, however, cannot be used for further 
improvements. One of their main ideas is to work simultaneously with a few 
multicollisions, instead of only one. In the case of the A;-tree algorithm, this 
would mean to produce from L 3 several p-partial multicollision sets. However, 
each such set will collide on 8 different value of the l LSBs, i.e. the elements of the 
first p-partial multicollision set will have the value v\ on the l LSB, the elements 
of the second set will have the value V 2 , etc. The different values will increase 
the complexity of the later stage of k - tree by a factor of s. When using the join 
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operator on l bits of L\ and L 2 there will be s targets (whereas previously we 
had only one), thus a simple collision search will have to be repeated 5 times. 
Therefore, in this particular case, Dinur et al. approach cannot be used. 


Improvements for k > 3. Our technique can be applied as well to improve 
the k - tree algorithm for larger (and non-power of two) values of k. Again, we will 
start with the classical k - tree and assume that all the lists are of size 2 l (where 
l < lg fe +1 )• Given k that is not a degree of 2, the number of active lists kA is 
2i lg G and the number of passive lists kp is k — kA- For instance, for k = 7, it 
means that kA = 4, kp = 3 (see Fig. 3). Without loss of generality, assume that 
the first kA lists are active, and the remaining lists are passive. First, we produce 
p-partial multicollision sets on A = l • lgfc^ bits, independently for all of the kp 
passive lists, and obtain L/~ A+ 1 , . . . , L&. Let vi , . . . , Vk P be the common values of 
these sets, and v = v 1 ® . . . ® v^ p . Obviously the set Lp = T/c a +i tX/ . . . \x\i L& 
has cardinality \Lp\ = p kp and all the elements of the set have the value v on the 
A LSBs. For the sake of simplicity, assume v = 0. Next, focus on the active lists 
and find 2 l solutions for /^-problem on the same A = / • lg kA bits by running 
the k - tree with initial lists of sizes 2 l . Note, this way at level lg kA there will be 
one list with 2 l elements that have zeros on A LSBs. If the number of solutions 
produced from the two independent steps satisfy p kp • 2 l > 2 n-A , then one of the 
elements of the list found by the kA- tree algorithm can be matched with one of 
the elements of Lp, on the remaining n — A bits. As a result we will obtain one 
solution to the original k- list problem. 

Let us focus on the complexity of the algorithm. The p-partial multicollisions 
produced from the passive lists Lp i = + 1, . . . , fc, require around kp • |L^| = 

kp • 2} operations. Under the assumption that kp is small, the additional kp -\-p kp 
operations spent on producing v and Lp can be ignored as the whole complexity 
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T 

IX 
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Fig. 3. Multicollision technique for k = 7. The values in blue denote the size of the lists. 


692 


I.Y. Nikolic and Y. Sasaki 


will be dominated by the multicollisions. On the other hand, the production of 2 l 

A 

elements with the fc^- tree requires around kA • 2 lgfc A = fc A • 2 Z operations. As a 
result, the total complexity of the algorithm is around kp • 2 l + kA • 2 Z = (fcp + 
Au) • 2 Z = k • 2 Z . Let us find the value of Z. For this reason we equate • 2 Z to 
2 n-A (specified in the inequality above), and obtain kp Igp + Z = n — Z • lg fc^, or 
equivalently 

, = ™ lg V /gX 

lg + 1 lg^ + 1* U 


Therefore the improved fc-tree outperforms the classical fc-tree by a factor of 


k2 lgk A + 1 

k2 l 


kp lg p 

2 ^ k A + 1 


K p> K p> 

( 2 lgP ) + i = p 1 ^ fc A + ! . 


( 7 ) 


The value of p can be approximated as follows. First note that we can no longer 
use the balls-into-bins problem, as the size of the lists (i.e. 2 l ) not necessarily 
equals the size of the multicollision space (e.g., when k m 7, the space has 2 21 
elements). Therefore, we use (3), to approximate the number of multicollisions. 
From (3), with a simple transformation we obtain that ^ = lg | . The approxi- 
mate solution of this equation is of the form p = ^jrr- Therefore, the speed up 
factor of our improved fc-tree algorithm can be evaluated as 


_Z_ 

'ifT 


( — ) k A + 1 


a • n c 
lg (b ■ n) ’ 


where a, 6 , c are constants that depend on the values of fc^ and kp. 


(8) 


Applications. The improvement of the 3-tree algorithm can be used for crypt- 
analysis of authenticated encryption schemes proposed to the ongoing CAE- 
SAR [3]. Some of these schemes, to process the final incomplete blocks of mes- 
sages, use a construction called XLS proposed by Ristenpart and Rogaway [27]. 
Initially, XLS was proven to be secure, however Nandi [ 22 ] points out flaws in the 
security proof and shows a very simple attack that requires three queries to break 
the construction. However, the CAESAR candidates that rely on XLS, do not 
allow this trivial attack as the required decryption queries are not permitted by 
the schemes. To cope with this limitation, Nandi proposes another attack [23], 
that requires only encryption queries. He is able to reduce the design flaw of 
XLS to the 3-list problem. Therefore, Nandi is able to attack schemes that claim 
birthday bound query complexity because with only 2 t queries (equivalent to 
size of the lists in the 3-list problem) , he can find a solution to the 3-list problem 
(in 2 ^r time). However, Nandi cannot break the schemes that claim birthday 
bound time complexity, as he cannot solve the 3-list problem faster than 2?. 
Note, Nandi constructs the 3-list problem from only 2 s queries, rather than 
from 3 • 2 3 5 as the elements of all three lists depend on the same 2 ^ ciphertexts. 

The CAESAR schemes based on XLS, such as Deoxys [16], Joltik [17], 
KIASU [18], Marble [13], SHELL [32], claim only birthday bound time complex- 
ity, thus Nandi’s findings do not break the security claims of these candidates. 
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However, our improved 3-tree algorithm goes below the birthday bound and thus 
can be used to show a slight weakness in some of these candidates. 

Let us focus on the 128-bit CAESAR candidates Deoxys and KIASU. The 3- 
list problem for XLS in these candidates has the parameter n = 128. According 
to Table 2, we can take y/p = 4.42 and l = 62. Consequently, the complexity of 
a forgery based on the XLS weakness is C • 2 62 , where C is a constant factor. 
The value of C is 1 because: 1) as mentioned above, the 3 lists can be produced 
from the same 2 62 ciphertexts, and 2) all of the operations required by the 
improved 3-tree algorithm are significantly less expensive than one encryption 
of the analyzed schemes. As a result, we obtain a forgery on the COPA modes 
of Deoxys and KIASU in 2 62 encryptions and thus the security level of these 
schemes is reduced by 2 bits from the claimed 64 bits. 

4 Improved Time-Memory Tradeoffs 

In applications, usually the elements of the lists Li are in fact outputs of functions 
fi, thus GBP is often formulated as: 

Problem 2. Given non-invertible functions /i, . . . , /& : {0, l} n — > {0, l} n , n' > 
n, find 2 / 1 , . . . ,y k G {0, l} n ' such that 0 f 2 ( 312 ) 0 ■ ■ ■ 0 fk(Vk) = 0. 

In some applications, all the functions are identical, and the problem is to 
find distinct inputs: 

Problem 3. Given a non-invertible function f : {0, l} n —> {0, l} n , n f > n, find 
distinct y 1} . . . , y k e {0, l} n ' such that f(y{) © f(y 2 ) © ... 0 f(yk) = 0. 

Both of the above problems give rise to the possibility of time-memory trade- 
offs, i.e., reducing the memory complexity of the k - tree algorithm at the expense 
of time. We will investigate time-memory tradeoffs for the GBP as defined in 
Problem 3. Recall that &-tree in its current form assumes that both time and 
memory are of equal magnitude, i.e. T = M = 0(k • 2 1 § fc + 1 ). 

Bernstein et al. [5,6] investigate k - tree in memory restricted environments 
and propose a few tradeoffs. Their main approach is to solve the k - list problem 
on less than n bits. Assume M = 2 m , where M < 2 1 § fc + 1 . Then, a k - list problem 
on n = m(lg k + 1) bits (instead of n bits) can easily be solved with the k - tree 
algorithm. The first tradeoff idea is to perform a precomputation (or prefiltra- 
tion) such that all the entries in each list have the value of 0 in the n — n most 
significant bits. 4 For the remaining n least significant bits, they apply the k - tree 
algorithm and thus find a solution for all n bits. The time complexity is the sum 
of the cost for precomputation and for solving the k - tree algorithm, which is 
k • (2 n_ ™ . 2 m + 2 m ) « k • 2 n ~ mlgk . The tradeoff is therefore defined as 

T • M lg k = k - N. (9) 


4 It is pointed out in [6] that n — n bits can have an arbitrary value as long as the 
sum of all lists is zero. The technique is called clamping through precomputations. 
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Their second idea is similar but does not use precomputation. They apply the k- 
tree algorithm on n = m(lg k + 1) bits until the value of the remaining n — n bits 
probabilistically becomes zero. Obviously, in total there will be 2 n_n repetition 
of the k- tree, thus the time complexity becomes T = k • 2 m • 2 n_n = k • 2 n-mlg/c , 
which provides the same tradeoff as the previous one, i.e., 

T ■ M lg k = k ■ N. (10) 

The third idea also relies on reduction of n, but the technique is more advanced. 
Assume, /i = /h, / 3 = / 4 , • • he. the functions are pairwise identical. The k - list 
problem is regarded as two separate | problems, the first involving the functions 
/i,/ 3 ,..., while the second / 2 , /*, . . .. If the amount of available memory is 2 m , 
then it is possible to solve each of these |-list problems on up to n = m(lg |+1) = 
m • lgfc bits. By elevating the two |-lists to fc-list, the remaining n — n bits can 
be zeroed with the use of memoryless collision search algorithm. Therefore the 
time complexity is T = | • 2 I V 1 • 2 m = | .2t _m (^ _1 ) and their tradeoff curve 
is defined as 



which is converted to 

k 2 

T 2 • M lgk ~ 2 = — • TV, (11) 

Because this method solves |-list problem, it is meaningful when k > 4. We note 
that when M < 2 1 § fc + 2 , then (11) provides better tradeoff while for M > 2 ^ fc + 2 , 
(9) is better. 

The k - tree relies on producing multiple collisions. For instance, at the first 
level of 4-tree, 2t colliding pairs on | bits are produced. Producing these pairs is 
trivial when the amount of available memory is 2?. However, once the memory 
is reduced to 2 m ,m < the trivial collision search does not work. 

The fact that k - tree requires multiple collisions, opens doors to the following 
technique based on Heilman’s tables [15] 5 . 

Fact 1. (Heilman’s table) Let f : {0, 1}* — > {0, l} n be an arbitrary -size input 
and n-bit output function , N = 2 n , and let M = 2 m be the amount of avail- 
able memory. Once the precomputation equivalent to MX evaluations of f is 
performed, the cost of generating new collisions for f is per collision. 

The technique works as follows. Choose M distinct values v® G {0, l} n , where 
i = 1,2,. . . , M. For each of them, compute chains of length X with the target 
function /, i.e. compute vj /(f^ _1 ) for i m 1, . . . , M, j = 1, . . . , X, and store 
only the first and last values of each chain, i.e. in a precomputation 

5 Note, we could not exploit the more advanced Rivest’s distinguished points and 
Oechslin’s rainbow tables [25] to improve the analysis. 
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Fig. 4. Heilman’s table T pre when M memory is available. 


table T pre . The construction of T pre is depicted in Fig. 4. Note, even though MX 
values exist in all the chains, only 2 M values are stored in T pre . Once T pre is 
constructed, to generate a collision, start with a random point and construct a 
chain of length -44. As there are N possible values, and MX are in T pre , one 
point of the new chain will collide with one point of the chains created during 
the construction of the table. The match can be detected by further extending 
the new chain at most X times, as eventually it will reach one of vf stored in 
T pre . Then, the exact colliding values can be detected by recalculating chains 
from v® and the starting value of the new chain. Obviously T pre can be reused 
to find not only one, but multiple collisions. 

Joux and Lucks [20] use this technique to produce 3-collisions. They set 
M = X = 2 t to generate 2? ordinary collisions with time T = 24 and memory 
M = 2s . Then, they find another collision between 2 s ordinary collisions and 
24 single values. When they generate 2 t ordinary multiple collisions, Heilman’s 
table has an important role to keep the memory M rather than MX. 

Further, we will use Heilman’s table to produce multiple collisions for the 
first level of k- tree, but only on certain l bits (where l <n). 

4.1 Improved Time-Memory Tradeoffs for the 4-List Problem 

We present a more efficient time-memory tradeoff for GBP. Our tradeoff curve 
depends on the number of available lists, which is parameterized by k. For a 
better understanding, first we explain our algorithm for k = 4. 

The original 4-tree algorithm consists of two-level collision searches (the para- 
meter l used below will be determined later). 

Level 1. Construct two lists, Li 2 and I/ 34 , each containing 2 partial collisions 
on l bits. 

Level 2. Find a collision between the elements of L 12 and L 34 on the remaining 
n — l bits. 
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Fig. 5. Improved time-memory tradeoff for the 4- list problem with Heilman’s table. 


Our new 4-tree algorithm works similarly with the exception of Level 1 . At 

ry-y £ 

this level, we first construct Heilman’s table, and then we use it to find 
collisions. As a result, our algorithm decomposes Level 1 into two parts. Its 
complexity depends on the available memory M which in turn determines the 
length of the chains X. The updated 4-tree is illustrated in Fig. 5 and is specified 
as follows. 

Level la. Construct Heilman’s table containing M chains, each of length of X. 

ry-y £ 

Level lb. With the use of Heilman’s table, find 2 • partial collisions on l 
bits. Store a half ( 2 ZL 2 _ ) of them in a list L 12 and the other half in L34. 
Level 2. Find a collision between the elements of L 12 and L 34 on the remaining 
n — l bits. 


Construction of Heilman’s Table. For the Level la our algorithm first con- 
structs Heilman’s table which contains M chains of length X. However, unlike 
in [20], we have the following technical obstacle. The function / takes an n-bit 
input and produces an n-bit output and thus for such a function only full n-bit 
collisions can be identified. In other words, the classical Heilman’s table cannot 
be used to find partial collisions. 

To solve this problem, we define a reduction function fi : {0, 1}* — > {0, 1}* so 
that only the l bits are meaningful in the chain. For generating chains with //, 
n — l bits of 0 ’s are concatenated to the l - bit input, and this value is processed 
with / : {0, l} n — > {0, l} n . Finally, the n-bit output is truncated to l bits, and is 
used as the input to the next chain. That is, fi(x) = Trunci(f( 0 n_ ^||x)), where 
Trunci(-) truncates the n-bit input to the l least significant bits. 

To summarize, we choose M distinct l- bit values v® for i = 1, 2, . . . , M, for 
each of them generate a chain of length X by computing v{ +1 = where 

j = 1, 2, . . . , X. In total, MX values are in all the chains and only the first and 
the last points of each chain are stored in T pre . Thus Heilman’s table requires 
around MX time and M memory. 
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Generation of l - bit Collisions. According to Factl, once Heilman’s table 
T pre is constructed, the complexity for producing Gbit collisions is reduced sig- 
nificantly. Considering that the size of the values in the chains is / bits and the 
length of each chain is X, Fact 1 shows that the cost is per collision. 

To generate an /-bit collision, we choose a random /-bit value and with the 
function fi from it compute a chain of length -\-X. On average, one collision 

will occur before we reach the value of this new chain against the MX values 
stored in T pre . The computation of additional X values in the chain ensures that 
the corresponding vf will appear as one of the ending points of T pre . The exact 
colliding pairs are detected by recomputing the chains from v® and the chosen 
/-bit value. 

From the definition of //, the two inputs colliding on / always have the form 
(0 n_z ||/i, 0 n_ ^||/2), where 0 n ~ l is a sequence of n — l zero bits and li and I2 are 
some /-bit values. A collision of the two chains means that Trunci (/( 0 n_z ||/i)) = 
Trunci (f( 0 n ~ l \\h))- Therefore, /( 0 n_z ||/i) and f( 0 n ~ l \\l 2 ) only collide in the 
least significant / bits, while on the remaining n — l bits behave randomly. 

The collision generation process is iterated times and the input and 

output of each pair is stored in L12. Similarly, the process is iterated additional 


2 2 times and the results are stored in L34. Therefore the complexity of this 


step is around 2 • 2 ^ • X— = 2 • 


7V222 

MX 


time and 2 • 2 2 = 2 • 


N2 

~~r 

22 


memory. 


Finding a Solution to the 4-list Problem. From the two lists L42 and L34 

containing 2 partial collisions on / bits, we find a collision on the remaining 

n — l bits. This procedure is straightforward and it requires — AX time 

22 

and no additional memory. 


Parameters and the Tradeoff. The complexities for each step are as follows: 


Memory = M 

N i 

Memory = 2 • — — 

Memory = negligible 
To balance the memory at Level la and Level lb, M, X, / should satisfy the 

1 

relation M = 2 • XjL, From this relation, the time complexity of Level 2 becomes 
22 

, and thus is negligible compared to Level la when X is sufficiently large. To 

1 1 

balance the time complexities of Level la and Level lb, we need MX = 2- N mx ? 
which gives the relation M S X 2 = 4 -N. Finally, as the time complexity T satisfies 
T = MX, we obtain the following tradeoff curve 


Level la. 
Level lb. 

Level 2. 


Time = MX, 


Time — 2 • 


AT525 


MX ’ 


Time = 
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T 2 • M = 4 • N. ( 12 ) 

For instance, when the available memory is 2^ (instead of 2t as in the 
original 4-tree), then the updated 4-tree finds a solution in around 2"? time. 
This is to be compared to Bernstein et al. tradeoffs given in (9) and (10) which 
require around 2 ^ time. Additional points of the tradeoff curve and comparison 
to previous results are given in Table 4. 

During the analysis, we relied implicitly on several facts. First, we assumed 
that Heilman’s table can contain an arbitrary number of points. In order to 
avoid collisions between the chains, however, the values of M and X cannot 
be arbitrary, but should depend on /. That is, during the construction of Hell- 
man’s table, the number of chains and their length is bounded by the value 
of l. Biryukov and Shamir in [7] call this a matrix stopping rule, and define 
it as MX 2 < 2 l . It is trivial to see that this inequality holds in our case as 
MX 2 = — = 2 Z . For instance, when M = 2 ?, then 

/^|,T = 2ir,X = 2i. Therefore, obviously MX 2 = 2? = 2 l . We assumed 
as well that the tradeoff applies only to Problem 3. However, a close inspections 
of our algorithm reveals that it can be applied to the case of pairwise identical 
functions, i.e., /i = / 2,/3 = J 4 . That is, the area of application of the trade- 
off is wider, and is similar to the area of the tradeoff given by Bernstein et al. 
in (11). To deal with the extended case, we have to create two Heilman’s tables 
at the initial stage, one for each pair of functions. Thus the time and memory 
complexities will increase by a factor of two at Level la, and will stay the same 
at Levels lb and 2 . 


4.2 Improved Time-Memory Tradeoff for the k- list Problem 

In this section, we generalize the time-memory tradeoff for the k- tree algorithm, 
where k = 2 d . Overall, we replace the collision generation at Level 1 of the k- tree 
algorithm with a generation based on Heilman’s table. Hereafter, we call the bits 
whose sum is fixed to zero clamped bits. 

The ordinary k- tree algorithm initially starts from 2 d lists containing M = 2 m 
elements. At Level 1, 2 d ~ x lists containing M elements are generated with m 
bits clamped. At Level ifori = 2,3,...,d— 1, 2 d ~ l lists containing M elements 
are generated with im bits clamped. At the last Level d there are two lists 
containing M elements with (d — 1 )m bits clamped. As no longer M collisions 
are required, but rather only one, the sum on up to (d + l)m bits can be 0, by 
setting (d + 1 )m = n, and thus the k- tree algorithm will find the solution to 
the k- list problem. However, if the memory size is restricted, i.e. m <C ^qpx, the 
k- tree algorithm can enforce the sum of only ( d + 1 )m bits to zero. 

Our algorithm replaces Level 1 with Heilman’s table collision generation and 
performs the same procedure as the k- tree algorithm from Level 2 to Level d. 
To find the required solution after Level d, however, at Level 1 we clamp more 
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Table 3. A comparison of the number of clamping bits between the k - tree and our 
algorithm. 


# lists 

#Clamped bits 

k - tree algorithm 

Our algorithm 

Level 1 2 d ~ 1 

m 

l 

Level i, (i = 2, . . . , d — 1) 2 d ~ l 

im 

l + (i — 1 )m 

Level d 1 

(d + 1 )m 

l + dm 


bits. Let the number of the clamped bits at Level 1 be L After the first level 
we will have 2 d ~ x lists, each with M = 2 m elements. Similarly, after Level i for 
i = 2, 3, . . . , d— 1, we will have 2 d ~ % lists containing M elements with l + (i — \)m 
bits clamped. After the final Level d, we will have one element with l + dm zero 
bits. Therefore, we set l + dm = n, i.e. I = n — dm , to get at least one solution 
on all n bits. In Table 3, we compare the number of clamped bits of the fc-tree 
and our algorithm. 

From the condition l = n—dm and the parameters k and m, we can determine 
the reduction function fi for Heilman’s table. We create M chains of length X, 
and only store the first and last values of the chains in Heilman’s table T pre . 
Once T pre is constructed, we can find an l- bit partial collision with a cost of 
per a collision, which is equivalent to M I+i x • At Level 1, we produce in total 
(2 d ~ 1 • M ) l- bit collisions, and store them in 2 d ~ x lists each with M elements. 
The total cost for producing the partial collisions and thus the complexity of 
Level 1 is 2 d ~ 1 • 


Complexity Evaluation and the Tradeoff Curve. The complexity to gen- 
erate T pre is MX time and M memory. As mentioned above, Level 1 requires 
2 d ~ x • time and 2 d ~ x • M memory. The time and memory complexities of the 
remaining Levels 2 to d are all M, thus negligibly small compared to the genera- 
tion of T pre . We balance the time complexity of Heilman’s table generation and 
of Level 1, which gives the relation T = MX = 2 d_1 • KX , X , and can further be 
reduced to (MX) 2 = 2 d_1 • m N j-i and approximately results in a tradeoff curve 

T 2 • M lgfc_1 = k ■ N (13) 

Note, the tradeoff given in Sect. 4.1 can be obtained from the above tradeoff 
by setting k = 4. In Table 4, we compare the previous tradeoffs given in (9), 
(11) to our new tradeoff for k = 4,8 and for two particular memory amounts. 
Obviously, the time complexity of our algorithm is significantly smaller for the 
same amount of available memory. 

The tradeoff curves of these three methods are also depicted in Fig. 6. The 
vertical axis and horizontal axis represent the logarithm of the time complexity t 
and memory complexity m, respectively. Curves for k = 8 and k = 16 are drawn 
in Fig. 6 with red lines and black lines, respectively. For k = 8 with m > the 
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Table 4. Comparison of tradeoffs. For simplicity, the constant multiplication for N is 
ignored. 


Method 



M 

T 

Other parameters 

k = 4 

Bernstein 

et al. Eq.(9) 

2^ 

2? 

- 




(T- 

M 2 = 

N) 

2t 

2^ 

- 




Our 



_ n 

24 

3n 

2 8 

X 

= 2 i,Z = 

n 

2 


(T 2 

• M = 

N) 

2® 

5 n 

2 12 

X 

= 2?,/ = 

2n 

3 

k = 8 

Bernstein 

et al. Eq.(9) 

2 1 

2^ 

- 




(T- 

M 3 = 

N) 

_ n 

2 6 

_ n 

2 2 

- 




Bernstein 

et al. Eq.(ll) 

~ n 
25 

zn 

2~s~ 

- 




(T 2 

• M = 

N) 

2t 

2^ 

- 




Our 



n 

2s 

3n 

2 io 

X 

= 2®, l = 

2n 

5 


(T 2 

■ M 2 -- 

= N) 

2f 

~ n 

2 3 

X 

= 2t,Z = 

n 

2 


ordinary k - tree algorithm with t — \ can be performed. Thus, the time- memory 
tradeoffs are meaningful only when the memory amount is limited to m < 
and Fig. 6 only describes the curves in this range. Similarly, for k = 16 only 
m < ^ is shown in the figure. 

The previous curve given in (9) achieves the same time complexity as the 
k - tree algorithm when sufficient memory is available, while the time complexity 
is about 2 n when the available amount of memory is very limited. The previous 
curve given in (11) cannot reach the time complexity of the k - tree algorithm even 
if sufficient memory is available, while the time complexity is at most 2* for very 
limited amount of memory. It is easy to see that our tradeoff takes advantages 


t 



— K=8 (Ours) — -k=8 (Berstein et al 1) k=8 (Bernstein et al 2) 

K=16 (Ours) k=16 (Bernstein etal 1) k=16 (Bernstein et a!2) 


Fig. 6. Comparison of tradeoff curves. Our curve for k = 8 and Bernstein et al. 2 for 
k — 16 are overlapped in the range of m < n/ 5. 
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of those two curves, i.e. it requires the same complexity as the k-tiee algorithm 
when sufficient memory is available and requires only 2^ time complexity when 
the available amount of memory is limited. Therefore, our tradeoff always allows 
a lower time complexity than both of the previous tradeoffs. It improves the 
time complexity and simplifies the situation, as it is the best for any value of 
m (unlike the previous two tradeoffs that outperformed each other for different 
values of m). 

5 Conclusion 

We have shown improvements to Wagner’s k - tree algorithm for the case when 
k is not a power of two, and when the available memory is restricted. For the 
former case, our findings indicate that the passive lists can be used to reduce 

the complexity of the k - tree (in the case of 3-tree, by a factor of \j 2 )• 
Rather than discarding the passive lists, we have produced multicollisions sets 
from them, and later, we have used the sets to decrease the size and thus the 
complexity of the k - tree algorithm. In the case of a memory restricted k - list 
problem, we have provided a new time-memory tradeoff based on the idea of 
Heilman’s table. The precomputed table has allowed us to efficiently produce a 
large number of collisions at the very first level of the k - tree algorithm, and thus 
to reduce the memory requirement of the whole algorithm. As a result, we have 
achieved an improved tradeoff that follows the curve T 2 M lg/c_1 = k • N. 

We point out that we have run series of experiments to confirm parts of 
the analysis. In particular, we have verified that the predicted number of mul- 
ticollisions and we have completely implemented the tradeoff for k = 4, n = 60 
and various sizes of available memory, e.g., m = 8, 10, 14. The outcome of the 
experiments has confirmed the tradeoff. 

The 3-list problem appears frequently in the literature and as our improved 3- 
tree algorithm is the first that solves this problem with below the birthday bound 
complexity, we expect future applications of the algorithm. However, although 
our improved 3-tree asymptotically outperforms Wagner’s 3-tree algorithm, the 
speed up factor is lower for smaller values of n. Thus we urge careful analysis 
when applying the improved 3-tree. 

Bernstein [5] argues that the large memory requirement of Wagner’s k - tree 
algorithm makes it impractical. He assumes that the memory access is far more 
expansive, thus the actual cost of the algorithm is miscalculated. He introduces 
tradeoffs (discussed in Sect. 4) to reduce the memory requirement, and to obtain 
algorithms of lower complexity (measured by the new metric). We note that as 
our tradeoffs are more memory effective, by the new metric they lead to better 
algorithms for the k-tiee problem with pairwise identical functions. 

There are several future research directions. One is to consider restrictions 
on the amount of available data. The functions fi in the k - list problem are often 
assumed to be public, i.e. the attacker can evaluate them offline. When fi are not 
public, the data needs to be collected by making online queries. Thus developing 
new time-memory-data tradeoffs for this scenario is an interesting open problem. 
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Another direction is to consider the weight of each function in the total cost of 
the algorithm, which leads to the case of an unbalanced GBP. This is based on 
the fact that in specific applications, it may occur that some of the functions 
are more costly to compute than other functions. The algorithm that solves an 
unbalanced GBP will be different than the one for the balanced GBP. 
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Abstract. We assume a scenario where an attacker can mount several 
independent attacks on a single CPU. Each attack can be run several 
times in independent ways. Each attack can succeed after a given num- 
ber of steps with some given and known probability. A natural question 
is to wonder what is the optimal strategy to run steps of the attacks in a 
sequence. In this paper, we develop a formalism to tackle this problem. 
When the number of attacks is infinite, we show that there is a magic 
number of steps m such that the optimal strategy is to run an attack 
for m steps and to try again with another attack until one succeeds. We 
also study the case of a finite number of attacks. 

We describe this problem when the attacks are exhaustive key 
searches, but the result is more general. We apply our result to the learn- 
ing parity with noise (LPN) problem and the password search problem. 
Although the optimal m decreases as the distribution is more biased, 
we observe a phase transition in all cases: the decrease is very abrupt 
from m corresponding to exhaustive search on a single target to m = 1 
corresponding to running a single step of the attack on each target. For 
all practical biased examples, we show that the best strategy is to use 
m — 1. For LPN, this means to guess that the noise vector is 0 and to 
solve the secret by Gaussian elimination. This is actually better than all 
variants of the Blum-Kalai-Wasserman (BKW) algorithm. 


1 Introduction 

We assume that there are an infinite number of independent keys A2, . . . and 
that we want to find at least one of these keys by trials with minimal complexity. 
Each key search can be stopped and resumed. The problem is to find the optimal 
strategy to run several partial key searches in a sequence. In this optimization 
problem, we assume that the distributions Di for each Ki are known. We denote 
D = (D 1, Consider the problem of guessing a key Ki , drawn following 

Di , which is not necessarily uniform. We assume that we try all key values 
exhaustively from the first to the last following a fixed ordering. If we stop the 
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key search on iQ after m trials, the sequence of trials is denoted by ii • • • i = i m . 
It has a worst-case complexity m and a probability of success which we denote 
by Pr D (i m ). 

Instead of running parallel key searches in sequence, we could consider any 
other attack which decomposes in steps of the same complexity and in which 
each step has a specific probability to be the succeeding one. We assume that 
the ith attack has a probability Pr^(i m ) to succeed within m steps and that 
each step has a complexity 1. The fundamental problem is to wonder how to run 
steps of these attacks in a sequence so that we minimize the complexity until 
one attack succeeds. For instance, we could run attack 1 for up to m steps and 
decide to give up and try again with attack 2 if it fails for attack 1, and so on. 
We denote by s = i m 2 m 3 m • • • this strategy. Unsurprisingly, when the D £ s are 
the same, the average complexity of s is the ratio where Cd{ l m ) is the 

expected complexity of the strategy l m which only runs attack 1 for m steps 1 
and Pr£>(l m ) is its probability of success. 

Traditionally, when we want to compare single-target attacks with different 
complexity C and probability of success p, we use as a rule of the thumb to 
compare the ratio Quite often, we have a continuum of attacks C(m) with 
a number of steps limited to a variable m and we tune m so that p(m) is a 
constant such as |. Indeed, the curve ofmn ^(m) 1S °^ en decreasing (so has 
an L shape) or decreasing then increasing (with a U shape) and it is optimal to 
target p(m) = But sometimes, the curve can be increasing with a F shape. In 
this case, it is better to run an attack with very low probability of success and 


C(m) 

P(m) 


to try again until this succeeds. In some papers, e.g. [14], we consider 
as a complexity metric to compare attacks. Our framework justifies this choice. 

LPN and Learning with Errors (LWE) [21] are two appealing problems in 
cryptography. In both cases, the adversary receives a matrix V and a vector 
C = Vs + D where s is a secret vector and D is a noise vector. For LPN, the best 
solving algorithm was presented in Asiacrypt 2014 [12]. It brings an improvement 
over the well-known BKW [5] and its variants [11,15]. The best algorithm has a 
sub-exponential complexity. 

Assuming that V is invertible, by guessing D we can solve s and check it 
with extra equations. So, this problem can be expressed as the one of guessing 
a correct vector D of small weight, which defines a biased distribution. Here, 
the distribution of D corresponds to the weighted concatenation of uniform 
distributions among vectors of the same weight. We can thus study this problem 
in our formalism. This was used in [8] . This algorithm is also cited in [6] and by 
Lyubashevsky 2 . 

Both LPN and LWE fall in the aforementioned scenario of guessing a fc-bit 
biased noise vector by a simple transformation. Work on breaking cryptosystems 
with biased keys was also done in [18]. 


1 Cd( l m ) can be lower than m since there is a probability to succeed before reaching 
the rath step. 

2 http://www.di.ens.fr/~lyubash/talks/LPN.pdf. 
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The guessing game that we describe in our paper also matches well the pass- 
word guessing scenario where an attacker tries to gain access to a system by 
hacking an account of an employee. There exists an extensive work on the crypt- 
analytic time- memory tradeoffs for password guessing [2-4,13,19,20], but the 
game we analyse here requires no pre-computation done by the attacker. 

Our Results. We develop a formalism to compare strategies and derive some 
useful lemmas. We show that when we can run an infinite number of independent 
attacks of the same distribution, an optimal strategy is of the form i m 2 m 3 m • • • 
and it has complexity 

. c D (i m ) 

mm — 7 - 

m Pr D (l m ) 

for some “magic” value m. This justifies the rule of the thumb to compare attacks 
with different probabilities of success. 

When the probability that an attack succeeds at each new step decreases (e.g., 
because we try possible key values in decreasing order of likelihood), there are 
two remarkable extreme cases: m = n (where n is the maximal number of steps) 
corresponds to the normal single-target exhaustive search with a complexity 
equal to the guesswork entropy [17] of the distribution; m = 1 corresponds to 
trying attacks for a single step until it works, with complexity 2 ~ H °° , where 
is the min- entropy of the distribution. 

When looking at the “magic” value m in terms of the distribution D , we 
observe that in many cases there is a phase transition: when D is very close to 
uniform, we have m = n. As soon as it becomes slightly biased, we have m = 1. 
There is no graceful decrease from m = n to m = 1. 

We also treat the case where we have a finite number \D\ of independent 
attacks to run. We show that there is an optimal “magic” sequence mi, m 2 , . . . 
such that an optimal strategy has form 

< < e |2)| m i l m 22 m 2 . . . |£)| m2 


The best strategy is first to run all attacks for mi steps in a sequence then to 
continue to run them for m 2 steps in a sequence, and so on. 

Although our results look pretty natural, we show that there are distribu- 
tions making the analysis counter-intuitive. Proving these results is actually non 
trivial. 

We apply this formalism to LPN by guessing the noise vector then performing 
a Gaussian elimination to extract the secret. The optimal m decreases as the 
probability r to have an error in a parity bit decreases from For r = |, the 
optimal m corresponds to a normal exhaustive search. For r < where k 

is the length of the secret, the optimal m is 1: this corresponds to guessing that 
we have no noise at all. So, there is a phase transition. 

Furthermore, for LPN with r = k ~^ , which is what is used in many crypto- 
graphic constructions, the obtained complexity is poly-e^ which is much better 

k 

than the usual poly • 2 log2 *= that we obtain for variants of the BKW algorithm [6]. 
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More generally, we obtain a complexity of poly • e /cln ( 1 T \ It is not better than 
the BKW variants for constant r but becomes interesting when r < 1( ^ 2 fe . 

When the number of samples is limited in the LPN problem with r = 
we can still solve it with complexity e °(Ak(\nk) ) which is better than 
with the BKW variants [16]. 

For LWE, the phase transition is similar, but the algorithm for m = 1 is not 
better than the BKW variants. This is due to the 0 noise having a much lower 
probability in LWE (which is 1 — r for LPN) in the discrete Gaussian distribution 
in Z q . 

For password search, we tried several empirical distributions of passwords 
and obtained again that the optimal m is m = 1. So, the complexity is 2~ H °°. 

Besides the 3 problems we study here, we believe that our results can prove 
to be useful in other cryptographic applications. 

Structure of the Paper. Section 2 formalizes the problem and presents a few 
useful results. In Sect. 3 we characterize the optimal strategies and show they 
can be given a special regular structure. We then apply this in Sect. 4 with LPN 
and password recovery. Due to lack of space, we do the same for LWE in the full 
version of this paper. We study the phase transition of the “magic” number m 
in Sect. 5 and conclude in Sect. 6. 

2 The STEP Game 

In this section we introduce our framework through which we address the fun- 
damental question of what is the best strategy to succeed in at least one attack 
when we can step several independent attacks. Let D = (D i, D 2 , . . .) be a tuple 
of independent distributions. If it is finite, \D\ denotes the number of distrib- 
utions. We formalize our framework as a game where we have a ppt adversary 
A and an oracle that has a sequence of keys (Ah, 7^2, • • •) where 7Q <— Di. At 
the beginning, the oracle assigns the keys according to their distribution. These 
distributions are known to the adversary A. The adversary will test each key 
Ki by exhaustive search following a given ordering of possible values. We can 
assume that values are sorted by decreasing order of likelihood to obtain a mini- 
mal complexity but this is not necessary in our analysis. We only assume a fixed 
order. So, our framework generalizes to other types of attacks in which we can- 
not choose the order of the steps. Each test on Ki corresponds to a step in the 
exhaustive search for Ki. In general, we write “i” in a sequence to denote that 
we run one new step of the ith attack. The sequence of “i”s defines a strategy 
s. It can be finite or not. The sequence of steps we follow is thus a sequence 
of indices. For instance, i m means “run the Ki search for m steps” . The oracle 
is an algorithm that has a special command: STEP(i). When queried with the 
command STEP(z), the oracle runs one more step of the I th attack ( so, it incre- 
ments a counter A and tests if Ki = A, assuming that possible key values are 
numbered from 1). If this happens then the adversary wins. The adversary wins 
as soon as one attack succeeds (i.e., he guesses one of the keys from the sequence 
K\ , K 2 5 • • • ) • 
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Definition 1 (Strategies). Let D be a sequence of distributions D = 
(D i, . . . , D\ d \) (where \D\ can be infinite or not). A strategy for D is a sequence 
s of indices between 1 and \D\. It corresponds to Algorithm 1. We let Prjj(s) be 
the probability that the strategy succeeds and Cd(s) be the expected number of 
STEP when running the algorithm until it stops. We say that the strategy is full 
if Pip (s) = 1 and that it is partial otherwise. 


Algorithm 1 . Strategy s in the STEP game 
1: initialize attacks 1, ... ,\D\ 

2: for j = 1 to |s| do 

3: STEP(sj): run one more step of the attack Sj and stop if succeeded 

4: end for 

5: stop (the algorithm fails) 


For example for s = 11223344 • • • , Algorithm 1 tests the first two values for each 
key. 

Definition 2 (Distributions). A distribution Di over a set of size n is a 
sequence of probabilities Di = (pi,...,p n ) of sum 1 such that pj > 0 for 
j = 1, ... ,n. We assume without loss of generality that p n 4 0 (Otherwise, we 
decrease n). We can equivalently specify the distribution Di in an incremental 
way by a sequence Di = \p[ , . . . ,p' n \ (denoted with square brackets) such that 

»'>= Pj + P !. +P , »=*■;<! 


for j = 1 

We have Pr^(i J ) s= p\ + • • • + pj = 1 — (1 — p[) • • • (1 — _p'-), the probability of 
the j first values under Di. 

When considering the key search, it may be useful to assume that distribu- 
tions are sorted by decreasing likelihood. We note that the equivalent condition 

to pj > pj+ 1 with the incremental description is 4- + j < p— + j + 1, for 

Pj Pj + 1 

j = 1. 

We define the distribution that the keys are not among the already tested 
ones. 

Definition 3 (Residual Distribution). Let D = (D i, . . . , D \ D \) be a sequence 
of distributions and let s be a strictly partial strategy for D ( i.e ., Pyd(s) < J-4 
We denote by u \~> s” the residual distribution in the case where the strategy s 
does not succeed, i.e., the event -i s occurs. 

We let #occ s (i) denote the number of occurrences of i in s. We have 
D\^s = (Ahl #OCCs(1) , • • ■ , D \ d | HD| #occ ‘(I d D) 
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where Di\^i u = 1:5 . . . . .] if A = [p' ?1 , . . . ,p ' in .]. Hence, defining distri- 

butions in the incremental way makes the residual distribution being just a shift 
of the original one. 

We write Pr£>(y|-i 5 ) = and Cd(s'\^s) = Cd\-, s (s'). 

Next, we prove a list of useful lemmas in order to compute complexities, 
compare strategies, etc. 

Lemma 4 (Success Probability). Let s be a strategy for D. The success prob- 
ability is computed by 


\D\ 

Pr( S ) = l-nPrH # ° CCs(i) ) 

i= 1 % 

Proof. The failure corresponds to the case where for all i, A is not i n 
{1, . . . , #occ s (i)}. The independence of the A implies the result. □ 

Lemma 5 (Complexity of Concatenated Strategies). Let ss' be a strategy 
for D obtained by concatenating the sequences s and s ' . IfPiD(s) = 1, we have 
Pr d(ss') = Pyd(s) and Cd(ss') = Cd{s )• Otherwise, we have 

Pr(ss') = Pr(s) + (l - Pr(») Pr(s'b s ) 

Cd{ss') = Cd(s ) + (l _ P r ( s )) C'd(s'I-'s) 

Proof The first equation is trivial from the definition of residual distributions 
and conditional probabilities. 

The prefix strategy s succeeds with probability Pyd(s). Let c be the com- 
plexity of s conditioned to the event that s succeeds. Clearly, the complex- 
ity of ss' conditioned to this event is equal to c. The complexity of ss' con- 
ditioned to the opposite event is equal to \s\ + Cd{s'\^s). So, Cd(ss') = 
Pr^(s)c + (1 — Pr£>( s ))(l 5 l + CAs'l-i 5 ))- The complexity of s conditioned to 
that s fails is equal to \s\. So, Cd{s) = Pr£>(s)c+ (1 — Pro(s))|s|. From these 
two equations, we obtain the result. □ 

Lemma 6 (Complexity with Incremental Distributions). Let Di = 

\p[ i, . . . ,p[ ] and let s be a strategy for D = (A, A, . . .). We have 

\s\ 

na-A,#occ sl ... v(St /)) 

t'= 1 

hi t-1 

na-A„ # occ sl ... v( .,)) 

t= 1 1'= 1 

Proof By induction, the probability that the strategy fails on the first t — 1 steps 
is Qt = UVM 1 ~ P' St ,,#occ sl ... St ,(s t ,))- We can ex P ress C D {s) = So, we 

can deduce Pr^(s) and Cd(s). □ 


Pr(s) = 

D 

Cd(s) = 
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Example 7. For Di = (pi, . . . ,p n ) = \p [ , . . . ,pJJ and m < n, due to Lemma 6 
we have 

Pr(l m ) » Pi + • • • + Pm = 1 - (1 - p[) ■ ■ ■ (1 - p'm) 

and 

m t— 1 

c D ( i m ) = zm-& 

t= i j=i 

rri 

= E( pt 4 1- Pn) = Pi + 2p 2 H f rnp m + mp m +i H h mp n 

t= 1 

The second equality uses the relations from Definition 2. 

We want to concatenate an isomorphic copy re of a strategy v to another 
strategy u. For this, we make sure that w and u have no index in common. 

Definition 8 (Disjoint Copy of a Strategy). Two strategies v and w are 

isomorphic if there exists an injective mapping ip such that w t = (p(v t ) for all t 
and = Di for all i. So, Cd(v) = Cd(w)- Let u and v be two strategies for 
D. Whenever possible, we define a new strategy w = new n (^) such that v and w 
are isomorphic and w has no index in common with u. 

We can define it by recursion: ifw% = if{vi), . . . ,w t - 1 = p>( v t- 1 ) are already 
defined and is not, we set it to the smallest index i (if exists) which does 

not appear in u nor in , w t - 1 and such that Di = D Vt . 

For instance, if v = l m , all Di are equal, and i is the minimal index which does 
not appear in u , we have new n (r’) = i 171 . 

Lemma 9 (Complexity of a Repetition of Disjoint Copies). Let s be 

a non-empty strategy for D. We define new strategies s+2? ■ ■ ■> disjoint 
copies of s, by recursion as follows: s +r = new ss+1 ... s+(r _ 1) (s). We assume that 
Sjpl , s_|_ 2 1 • • • , s+( r _i) can be constructed. If Pip (s) = 0, then 

C D (ss +1 s +2 • • • S+(r-l)) = r ■ C D {s). 


Otherwise, we have 

n / ^ 1 — (1 — Pr r ,(s)) r 

C_d(SS + iS+ 2 • • • S+(r-l)j = Cd{S) 

For r going to oo, we respectively obtain Cd{ss + is + 2 ■ • ■ ) = +oo and 

n , C D (s) 

C D (s S+lS+2 ...)= 

For instance, for s = l m and Di all equal, the disjoint isomorphic copies of s are 
s +r = (l+r) m . I.e., we run m steps the (l+r)th attack. So, ss+is +2 * * * <s + ( r _!) = 
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Proof. We prove it by induction on r. This is trivial for r = 1 . Let s r = 
8s + i 8 + 2 • * * s+ r . If it is true for r — 2, then 


C D Or— 1 ) = Cr>(s r - 2 ) + (1 — Pr(s r — 2 ))C , L>(s_|_( r _ 1 ) | -| S T — 2 ) 

_ ( 1 (1 p^(s)^ C D (s) + (1 - Pr^(s r _2))C^(5 + ( r _ 1 )|^s r _2) if Pr^fs) > 0 

\ (r - 1) • C D (s) + (1 - Pr D (Sr-2))C D (s + (r-l)hs r -2) if P PsO) = 0 

Clearly, we have 1 — Pr£>(s r _ 2 ) = (1 — Pr£>(s)) r_1 and G£>(,s + ( r _i) |— is, — 2) = 
Cd(s)- So, we obtain the result. □ 

Example 10 . For all D{ equal, if we let s = l m , we can compute 


C D { l m 2" 


1 - (1 -Pr D (l m )) r 

) = / 1 rrt \ C D( 1 ) 


Prr>(l m ) 
1 (Pm+1 T ' ' 


-PnY 


Pi 


Pm 


(pi + 2p 2 H b rnpm + mpm+i H b rnp n ) 


We now consider r = 00. For an infinite number of i.i.d distributions we have 


C D { l m 2 m 


x = gp(l m ) 

; Pr D (l m ) 

_ Pi + 2p 2 H + rnpm + mp m+ i H , mp n 

Pl + • • • + Pm 

= EUl ^ + ra(l — Pl H b Pm) 

Pl + • • • + Pm 


G m + ^ 


Pr^(l-) 


where G m = C Dl \ lm ( l m ) and D 1 |l m = ( 5 


>)■ If D 1 


< Prr> 1 (l m ) 5 ’ ’ ’ ’ Prr), (l m ) ^ ’ X1 ^ 1S 
ordered, G m corresponds to the guesswork entropy of the key with distribution 


We can see two extreme cases for s = On one end we have a 

strategy of exhaustively searching the key until it is found, i.e. take m = n. On 
the other extreme we have a strategy where the adversary tests just one key 
before switching to another key, i.e. m = 1 . For the sequences s = 12 • • • and 
s = l n 2 n • • • , i.e. m = 1 and m = n, when D 1 is ordered by decreasing likelihood, 
we obtain the following expected complexity: 


m= 1 => 


m = n => 


C D ( 12 •••) = — = 2" ff ~ (r,l) 

Pi 


e D (l"2"---) =C D (1") =G„ 


where iLoo(TL) and G n denote the min-entropy and the guesswork entropy of 
the distribution respectively. 

We now define a way to compare partial strategies. 
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Definition 11 (Strategy Comparison). We define 

minC d(s) = inf Cd{ss') 
s' ;Pr£>(ss') = l 


the infimum of Cd{ss'), i.e. the greatest of its lower bounds. We write s <d s' 
if and only if minC d{s) < minC d(s'). A strategy s is optimal if minC d(s) = 
minC£>(0) ; where 0 is the empty strategy (i.e. the strategy running no step at 
all). 

So, s is better than s' if we can reach lower complexities by starting with s 
instead of s' . The partial strategy s is optimal if we can still reach the optimal 
complexity when we start by s. 

Lemma 12 (Best Prefixes are Best Strategies). If u and v are permuta- 
tions of each other, we have u <d v if and only if Cd(u) < Cd{v)- 

Proof. Note that Pyjj(u) = 1 is equivalent to Pyjj(v) = 1. If Pyd(u) = 1, it 
holds that minC^^) = Cd{u) and minC^^) = Cd(v ). So, the result is trivial 
in this case. Let us now assume that Pyd(u) < 1 and Pyd(v) < 1. For any s', 
by using Lemma 5 we have 

C D {us') = C D {u) + (l - Pr(u)) C D (s'\^u) 


So, 


inf Cd(us') = Cd(u) + (l — Py(u)) 
s / ;Pri)(ns / ) = l V D J 


inf Cd(s' 

s' ;Ptd (us') = 1 


The same holds for v . Since u and v are permutations of each other, we have 
D\^u = D |-if. So, Props') = Props') and Cd{s'\^u) = Cd(s'\->v). Hence, 
inf Cd(s'\->u) = inf Cd(s'\^v). Furthermore, we have Pr^(^x) = Pr^(^). So, 
minCi)(7i) < minCniv) is equivalent to Cd{u) < Cd{v). □ 


3 Optimal Strategy 

The question we address in this paper is: what is the optimal strategy for the 
adversary so that he obtains the best complexity in our STEP formalism? That 
is, we try to find the optimal sequence s for Algorithm 1. At a first glance, we 
may think that a greedy strategy always making a step which is the most likely to 
succeed is an optimal strategy. We show below that this is wrong. Sometimes, it 
is better to run a series of unlikely steps in one given attack because we can then 
run a much more likely one of the same attack after these steps are complete. 
However, criteria to find this strategy are not trivial at all. 

The greedy algorithm is based on looking at the i for which the next applica- 
ble p'j in Di is the largest. With our formalism, this defines as follows. 

Definition 13 (Greedy Strategy). Let s be a strategy for D. We say that s 
is greedy if 

Pr(s t |-isi • • • s t _ i) = maxPr(i|-iSi • • • s t _i) 


for t = 1 , . . . , |s|. 
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The following example shows that the greedy strategy is not always optimal. 


Example 14- We take \D\ = oc and all Di equal to Di == (| f = [J, w>, 1]. 

After testing the first key, we have D |— il = (D\ . . .) with D' = (^, = 

[^>, 1 ]. Since | > the greedy algorithm would then test a new key and 
continue testing new keys. I.e., we would have s = 1234 • • • as a greedy strategy. 
By applying Lemma 5, the complexity is solution to c — 1 + |c, i.e., c = §. 
However, the one-key strategy s — 111 has complexity 

2 7 5 53 3 

- + 2 h 3 — = — < - 

3 36 36 36 2 

so the greedy strategy is not the best one. 

Remark : The above counterexample works even when \D\ is finite. If we take 
D = (Th, £> 2 ) with Di = (§,^,^ 3 ) = [f>T^l]> the greedy approach would test 
the strategy s = 1211 that has a complexity of 



161 

108' 


This is greater than ||, the complexity of the strategy 111. 

Next, we note that we may have no optimal strategy as the following example 
shows. 

Example 15 (Distribution with No Optimal Strategy). Let qi be an increasing 
sequence of probabilities which tends towards 1 without reaching it. Let Di = 
[ 1 qi,qi , . . . ,gi, 1] of support n. We have C[i n ) = ^-(1 — (1 — qi) n ) which tends 
towards 1 as i grows. So, 1 is the best lower bound of the complexity of full 
strategies. But there is no full strategy of complexity 1. 

When the number of different distributions is finite, optimal strategies exist. 


Lemma 16 (Existence of an Optimal Full Strategy). Let D = 

(Di, • • •) be a sequence of distributions. We assume that we have in D a 
finite number of different distributions. There exists a full strategy s such that 
Cd{s ) is minimal. 

Proof. Clearly, c = inf Cd{s ) over all full strategies s is well defined. Essentially, 
we want to prove that c is reached by one strategy, i.e. that the infimum is a 
minimum. First, if c = 00, all full strategies have infinite complexity, and the 
result is trivial. So, we now assume that c < +00 and we prove the result by a 
diagonal argument. 

We now construct s = S 1 S 2 ■ • • by recursion. We assume that S 1 S 2 • • • s r is 
constructed such that minC(si,S 2 * * * s r ) = c. We concatenate si, . . . , s r to 
where m is such that Pr^[i m_ 1 |-i,si • • • s r \ = 0 and Pr^[i m |-iSi • • • s r \ >0. The 
values of i to try are the ones such that i appears in si, . . . , s r (we have a finite 
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number of them), and the ones which do not appear, but we can try only one 
for each different Di. We take the choice minimizing minC(5i52 • • • s r i m ) and set 
<s r _l_i = i m . So, we construct a strategy s. 

If one key K{ is tested until exhaustion, we have Pro(s) = 1. If no key is 
tested until exhaustion, there is an infinite number of keys with same distribution 
Di which are tested. If p = Pr^[i m ] is the nonzero probability with the smallest 
m of this distribution, there is an infinite number of tests which succeed with 
probability p. So, Ptd(s) > 1 — (1 — p)°° = 1 . In all cases, as s has a probability 
to succeed of 1, s is a full strategy. 

What remains to be proven is that Cd(s) = c. We now denote by the ith 
step of s. 

Let q t be the probability that s fails on the first t — 1 steps. We have Cd(s) = 
Ell Qt. Let 6 > 0. For each r, by construction, there exists a tail strategy v such 
that Cd(s i • • • s r - \v) < c + s. Since q t is also the probability that s i • • • s r -\v 
fails on the first t — 1 steps for t < r, we have Ylt=i Qt — Cd{s i * * * s r -iv) < c + e. 
This holds for all r. So, we have Cd(s) < c-\-e. Since this holds for all 5 > 0, we 
have Cd{s ) < c. Consequently, Cd(s ) = c: s is an optimal and full strategy. □ 

The following two results show what is the structure of an optimal strategy. 


Theorem 17. Let D = (Hi, . . .) be a sequence of distributions. We assume 
that we have in D a finite number of pairwise different distributions but an 
infinite number of copies of each of them in D . There exists a sequence of indices 
ii < i 2 < • • • and an integer m such that = Di 2 = • • • and s = TfiVf • • • is 
an optimal strategy of complexity p r ^J) • 

Here are examples of optimal m for different distributions. 

Example 18 (Uniform Distribution). For the uniform distribution pi = -, with 
1 < i < n. We get Pr^(l m ) = and G rn = With this we obtain 

Cd( l m 2 m • • • ) = n — Thus, the value of m that minimizes the complexity 

is m = n and C^(l m 2 m • • • ) = 13 ^. The best strategy is to exhaustively search 
the key until it is found. 

Example 19 ( Geometric Distribution). For the geometric distribution with para- 
meter p, we have pi = (1 — p) 2_1 p, with i = 1,2,... or Di = [p,p, . . .]. Due to 
Lemma 5, we can see that for every infinite strategy 5, Cd(s) = -. 

In Appendix A we study concatenations of uniform distributions. 

We note that Theorem 17 does not extend if some distribution has a finite 
number of copies as the following example shows. 

Example 20 (Distribution with No Optimal Strategy of the Form TfiVf • • • ). Let 
Di = [1 — £, e, 1] of support n and D 2 = H 3 = • • • = [p, . . . ,p, 1] for 

e < p < \ and n large enough. Given a full strategy 5, the formula in Lemma 5 
defines a sequence q t (s) = p' St # OCCs s ( St y We can see that for all full strategies 
s and s ' , if \s\ < |s'| and q t (s) > q t (s ' ) for t = 1, . . . , |s|, then Cd(s ) < Cd{s')- 
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With this, we can see that s = 12 n is better than all full strategies with length 
at least n + 1. There are only two full strategies with smaller length: l n and 
2 n . We have Cd( 2 n ) = ^ > 2 as n grows. We have Cd{ 12 n ) = 

1 + g 1 ~^ 1 p ~ p ^ « 1 + | as n grows, so Cd( 12 n ) < C^(2 n ) for n large enough. We 

have C D (l n ) = 1 + g- ~ (1 ~ er ~ 1 - = 2 - (1 - e ) n_1 « 2 so C D (12 n ) < C D (l n ) for 
n large enough. For all strategies of length at least n + 1, s = 12 n collected the 
largest possible p' values. So, the best strategy is s = 12 n . It is better than any 
strategy of form TfiiVfi 

When we have a finite number of distributions, we may have no optimal strategy 
of the form in Theorem 17. We may have multiple layers of repetition of i 171 as 
the following result shows. 

Theorem 21. Let D\ be a distribution of finite support n. Let D = 
(D i, D 2 , . . . , D\d\) be a finite sequence of length \D\ in which D\ = D 2 = • • • = 
D\ d \. There exists a sequence mi, . . . , m r such that the strategy 

s = l mi 2 mi • • • |7)| m ii m 22 m2 . . . |il| m2 • • • l rTlr 

is optimal. 

We provide toy examples below. 

Example 22. We take D = with D\ — D 2 = (|,^,^,^) = 

[f, b !]• Here are the complexities of some full strategies. 

1.46 

1.584 

1.464 

= 1.5784 

= 1.4584 

so the last strategy is the best one. Notice that this is also a greedy strategy. 

Example 23. We take D = (D 1 ,D 2 ) with D x = D 2 = (^, ^ jgg, jgg, 

TSo) = [too • §! 1 1]. Here are the complexities of some full strategies. 

C D (111111) = 1.48 
Cd( 1211111) = 1.44 
C' r .(12121111) = 1.438 
££,(121212111) = 1.439 
££,(121122111) = 1.444 

so s = 12121111 is the best one. For this example we have that the optimal 
strategy requires m\ = 1, m 2 = 1 and m3 = 4. It is also greedy. 


C D (1111) = — = 

J 100 

7Q2 

^f 12111 * = 500 = 

££,( 11211 ) = — = 
y J 500 

7892 _ 

5000 “ 
7292 _ 

5000 “ 


££,( 121211 ) = 
££,( 112211 ) = 
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3.1 Proof of Theorem 17 


To prove the result, we first state a useful lemma. 

Lemma 24 (Is It Better to Do s or s' First?). If s and s' are non-empty 
and have no index in common (i.e., if s t ^ s' t , for all t and t' ), then ss' <d s' s 
if and only if i n [0? Too], with the convension that ^ = +oo for 

c > 0 and p = 0. 

Proof. Due to Lemma 5, when Pr^(s) < 1 we have 

Cd(ss') = C D (s ) + (l “ Pr( s )) C D (s'\-<s) 

Since s' does not make use of the distributions which are dropped in D|-is, we 
have Cp(s'\^s) = Cp(s'). So, 

C d (ss') = Cd(s) + (l - Pr(s)^ Cp(s') 

This is also clearly the case when Pro(s) = 1. Similarly, 

C D (s's) = C D (s') + (l - Pr(a')) C D (s) 

So, Cp(ss') < Cd{s's ) is equivalent to 

C D (s) + (l - Pr(s)) Cots') < C D {s ' ) + (l - Pr(s')) Cots) 


So, this inequality is equivalent to p r D J^ < Pr^(gq • D 

We can now prove Theorem 17. 

Proof (of Theoremll). Due to Lemma 16, we know that optimal full strategies 
exist. Let s be one of these. We let i be the index of an arbitrary key which is 
tested in s. We can write s = uoi^uii 1712 • • • i nir u r where i appears in no Uj and 
mj > 0 for all j, and iq, . . . , u r -\ are non-empty. 

Since s is optimal, by permuting i rrij and either Uj - 1 or Uj , we obtain larger 
complexities. So, by applying Lemma 24, we obtain 


C D (i mi ) < Cpju^uo) < C D (i™ 2 

Pro(i mi ) — Prp(ui\-iUo) ~ Pr^(i mi |-ii mi ) 


< • • • < C]j{u r \— 'Uq • • • u r — i) 


We now want to replace u r in 5 by some isomorphic copy of s which is not 
overlapping with uoi^uii 1712 • • • i mr . Due to the optimality of s, we would deduce 


Cp(u r \->uo • • • u r - 1 ) < Cp(s\-iuq • • • u r - 1 ) = Cp(s ) 
so Pr D D(i™i) — Cd(s) which would imply that the repetition of isomorphic copies 

(fff ) 

of i mi are at least as good as s, so Pr ^ mi ^ = Cp{s) due to the optimality of s. 
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But to replace u r in s by the isomorphic copy of s, we need to rewrite the original s 
containing u r by some isomorphic copy in which indices are left free to implement 
another isomorphic copy of s. 

For that, we split the sequence (1,2,3, . . .) into two subsequences v and v' 
which are non-overlapping (i.e. v t 7 ^ v t > for all t and £'), complete (i.e. for every 
integer j, v contains j or v' contains j), and representing each distribution with 
infinite number of occurrences (i.e. for all j, there exist infinite sequences t\ < 
£2 < • • • and t[ < 1 2 <•• • such that Dj = D Vt = D v r for all £). For that, we 

can just construct v and v' iteratively: for each j, if the number of j' < j such 
that Djf = Dj in v or v' is the same, we put j in r, otherwise (we may have 
only one more instance in v), we put j in v' (to balance again). For instance, if 
all Di are equal, this construction puts all odd j in v and all even j in v f . Hence, 
we can define s' = new v (s) and s" = new v /(s). s' will thus only use indices in v' 
while s" will only use indices in v. Therefore, s' and s" will be isomorphic, with 
no index in common. So, Cd(s) = Cd(s') = Cd{s"). 

Following the split of 8 , the strategy s' can be written s' = 
u' 0 i' 1711 u^i' 1712 • • • i ,rrir u' r with 


C D (i™') 

Pr£>(z m i) 


CDii '™ *) 
Pr D (i frni ) 


< CD(u' r \^u' 0 • • 


“r— l) 


ri (I \ 1 ■/ rn l f •/' 

Cr>(u r \^u 0 i u x i 


If we replace u' r in s' by s", since s' is optimal, we obtain a larger complexity. 
So, 


CD(u' 0 i 


/mi / ./m 2 
U^l 


■ • i' rnr u' r ) < CD(u' 0 i 


/mi / ./m 2 
Ui l 


• • i' mr s") 


These two strategies have the prefix u' 0 i' 171 1 u^i' 1712 • • • i ,7rir in common. We can 
write their complexities by splitting this common prefix using Lemma 5. By 
eliminating the common terms, we deduce 


C D (u' r \^u' 0 i ,rni u[i ,m2 ■ ■ < C D (s"\^u' 0 i' rni u' 1 i r 


■i' rn n = c D (s") = c D (s) 


We deduce 


C D {i 


< Cd(s) 


Pr D (i m i) 

Let i\ < %2 < • • • be a sequence of keys using the distribution Di. By Lemma 9, 

(j (i rn \ \ 

the strategy • • • has complexity . Since s is optimal, we have 


pffipi) ^ C d{s). Therefore, p^ ( V"V } = C D (s). 


Cc(n) _ 


3.2 Proof of Theorem 21 

For the proof of Theorem 21 we need the result of the following lemma. 


718 S. Bogos and S. Vaudenay 


Lemma 25. Let s = ui a vj b w be an optimal strategy with n occurrences of each 
key. We assume that i ^ j , a <b, u does not end with i, v has no occurrence of 
either i or j , and w has equal number of occurrences for i and j . Furthermore, 
we assume that either a/0 ; or v is nonempty and starts with some k such that 
u does not end with k. Then, Cd{s) = CD{uj b ~ a i a vj a w). 

Lemma 25 will be used in two ways. 

1. For s = u'j c vj b w with c > 0, b > 0, v with no i or j, and balanced occurrences 
of i and j in w, which has the same complexity as s' = u' j b+c vw (so, to 
apply the lemma we define a = 0, u = u'j c , k = j , and s = u' j c i°vj b w; all 
hypotheses are verified except v non-empty, but the result is trivial for empty 
v). This means that we can regroup j c and j b when there are separated by a 
v with no i and followed by a balanced tail w. 

2. For s = ui a vj b w with 0 < a <b, v with no i or j, and balanced occurrences 
of i and j in w, which has the same complexity as s' = uj b ~ a i a vj a w. This 
means that we can balance i a and j b when there are separated by a v with 
no i or j and followed by a balanced tail w. 

The proof of Lemma 25 is given in Appendix B. 

In what follows, we say that a strategy is in a normal form if for all t, z i — ^ 
#occ Sl ... St (i) is a non-increasing function, i.e. #occ Sl ... St (i) > #occ Sl ... St (i + 1) 
for all i. For instance, 1112322133 is normal as the number of STEP(l) is at no 
time lower than the number of STEP(2) and the same for the number of STEP(2) 
and STEP(3). 

Since all distributions are the same, all strategies can be rewritten into an 
equivalent one in a normal form: for this, for the smallest t such that there 
exists i such that #occ Sl ... St (i) < #occ Sl ... St (i + 1), it must be that s t = i + 1 
and ^occ Sl ... St _ 1 ( i ) = ^occ Sl ... St _ 1 (i + 1). We can permute all values i and i- hi 
in the tail s t s t + % • • • and obtain an equivalent strategy on which the function 
becomes non-increasing at step t and is unchanged before. By performing enough 
such rewriting, we obtain an equivalent strategy in normal form. For instance, 
12231332 is not normal. The smallest t is t = 3 when we make a second STEP (2) 
while we only did a single STEP(l). So, we permute 1 and 2 at this time and 
obtain 12132331. Then, we have t = 7 and permute 2 and 3 to obtain 12132321. 
Then, again t = 7 to permute 1 and 2 to obtain 12132312 which is normal. 

We now prove Theorem 21. 

Proof (of TheoremTY). Let s be an optimal strategy. Due to the assumptions, 
it must be finite. We assume w.l.o.g. that s is in normal form. We note that 
we can always complete s in a form s2 a2 3 as • • • so that the final strategy has 
exactly n occurrences of each i. So, we assume w.l.o.g. that s has equal number of 
occurrences. We write s = l rni xil rn2 X 2 • • • 1 rrir x r where the xfs are non-empty 
and with no 1 inside. 

As detailed below, we rewrite x r (and push some steps earlier in x r -f) so 
that we obtain a permutation of the blocks 2 mr , . . . , |D| mr . The rewriting is 
done by preserving the probability of success (which is 1) and the complexity 
(which is the optimal complexity). Then, we do the same operation in x r -\ 
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and continue until x\. When we are done, each x t becomes a permutation of 
the blocks 2 mt , . . . , |D| mt . Finally, we normalize the obtained rewriting of s and 
obtain the result. 

We assume that s has already been rewritten so that for each t' = t-\- 1, . . . , r, 
the Xf sub-strategy is a permutation of the blocks 2 m *' , . . . , \D\ m t' . Then, we 
explain how to rewrite x t . We make a loop for j = 2 to \D\. In the loop, we 
first regroup all blocks of f s by using Lemma 25 with i = 1: while we can write 
x t = u'j c vj b w' where c > 0, b > 0, v is non-empty with no j, and w' has no 
j, we write u = l rni xil rri2 X 2 • • • l mt i/ and w = w' V 71 ^ 1 x t +i • • • 1 mr x r , and set 
a = 0 and i = 1. This rewrites x t = u'j b + c vw' by preserving the complexity and 
making a permutation. When this while loop is complete, we can only find a 
single block of j’s in x t and write x t = vj b w ' , where v and w' have no j. So, we 
apply again Lemma 25 to balance l mt and j b : we write u = l 1711 xil 1712 X 2 - - - x t -i 
and w = rc / l mt+1 xt+i • • • l mr x r , and set a = m t and i = 1. This rewrites 1 1711 x t 
to by preserving the complexity and making a permutation. 

So, this rewrites x t to vj mt w f and x t -i to When the loop of j is 

complete, x t is a permutation of the blocks 2 mt , . . . , \D\ mt . 

Interestingly, the sequence mi, . . . ,m r is unchanged from our starting opti- 
mal normal full strategy s. If we rather start from an optimal full strategy s 
which is not in normal form, we can still see how to obtain this sequence: for 
each t, mi + • • • + m t is the next record number of steps for an attack i after 
the mi + • • • + m t -i record. That is the number of steps for the attack i when 
s decides to move to another attack. □ 

3.3 Finding the Optimal m 

We provide here a simple criterion for the optimal m of Theorem 17. 

Lemma 26. We let D\ = (pi , . . . ,p n ) = [p [ , . . . ,p' n \ be a distribution and define 
D = (D i, D i, . . .). Let m be such that s = l m 2 m • • • is an optimal strategy based 
on Theorem 17. We have < C D ( l m 2 m • • • ) < -fi — . 

Proof. We let s = 2 m 3 m • • • We know that Cr>(l m+1 s) > Cr>(l m s) since 1 m s is 
optimal. So, 

0 < C D {l m+l s)-C D (V n s) 

= (l-Pr(l m ))(C D (lshl m )-C D (s)) 

= (1-Pr(n)(l - p > m+1 .C D (s)) 
from which we deduce -A — > Cn(s). Similarly, we have 

Pm+l 

0>C D (\ m s)-C D {\ m - 1 s) 

= (1 - Pr(l m- ' 1 ))(Cx)(ls|-il m_1 ) - C D (s)) 

= (1 — P ) r (l m_1 ))(l — p' m ■ C D (s)) 

from which we deduce -j- < Cd(s). □ 
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We note that if p m = p m+ i, then 



Pm+l 


Pm Pm 


Pm 


Pm+l + • ' • + Pn 


Pm+l + • ' ' + Pn Pm T Pm+l T ' ' ' H - Pn 


which is impossible (given the result from Lemma 26). Consequently, we must 
have p m Pm+i • So, in distributions when we have sequences of equal proba- 
bilities p t , we can just look at the largest index t in the sequence as a possible 
candidate for being the value m. 

Lemma 26 has an equivalent for Theorem 21 (given in the full version of this 
paper due to lack of space). 

4 Applications 
4.1 Solving Sparse LPN 

We will model the Learning Parity with Noise (LPN) problem in our STEP game. 
As we will see, we use the noise bits as the keys the adversary A is trying to 
guess. First of all, we formally give the definition of the LPN problem. 

Definition 27 (Search LPN). Let s Z 2 , let r e]0, |[ be a constant noise 
parameter and let Ber r be the Bernoulli distribution with parameter r. Denote 
by D s ^ r the distribution defined as 


{(v, c)\v+-l%,c= (v, s)®d,d <- Ber T } e Z+ 1 . 


An LPN oracle is an oracle which outputs independent random samples 

according to D S:T . 

Given queries from the oracle , the search LPN problem is to find the 
secret s. 

As studied in [6], the LPN-solving algorithms which are based on BKW [5] have 

k 

a complexity poly • 2 log 2 k . The naive algorithm guessing that the noise is 0 and 
running a Gaussian elimination until this finds the correct solution works with 
complexity poly • (1 — r)~ k . So, the latter is much better as soon as r < \og^k > 

and in particular for r = k~ i which is the case for some applications [1,9]. 
Experiments reported in [6] also show that for r = k~ 2 ? the Gaussian elimination 
outperforms the BKW variants for k > 500. 

The Gaussian elimination algorithm just reduces to finding a k- bit noise 
vector. It guesses that this vector is 0. If this does not work, the algorithm 
tries again with new LPN queries. We can see this as guessing at least one 
k- bit biased vector Ki which follows the distribution Di = Ber^ defined by 
Pr [Ki = v\ = t hw ( v )(1 — r ) fe - HW 0) i n OU r framework. The most probable vector 
is v = 0 which has probability Pr [LQ = 0] = (1 — r) k . The above algorithm 
corresponds to trying K\ = 0 then K 2 = 0, ... i.e., the strategy 123- • • in our 
framework. We can wonder if there is a better i m 2 m 3 m • • • . This is the problem 
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we study below. We will see that the answer is no: using m — 1 is the best option 
as soon as r is less than \ — e for 6 = ^ which is pretty small. 

For instance, for LPN 768 we obtain Cc>(12 • • • ) = 2 41 . I.e., 2 41 calls to 
the STEP command which corresponds to collecting k LPN queries and making 
a Gaussian elimination to recover the secret based on the assumption that the 
error bits are all 0. If we add up the cost of running Gaussian elimination in 
order to recover the secret, we obtain a complexity of 2 70 . This outperforms all 
the BKW variants and proves that LPN 768 i is not a secure instance for a 

’ y/768 

80-bit security. Furthermore, this algorithm outperforms even the covering code 
algorithm [12]. Our results are strengthened by the results from [6] where we see 
that there is a big difference between the performance of Cd( 12 • • • ) and the one 
of the covering code algorithm. 

Di is a composite distribution of uniform ones in the sense defined in 
Appendix A. Namely, Di = Ylw= o r/c (l — T ) k ~ w U w where U w is uniform of sup- 
port ^ ^ . By Theorem 17, we know that there exists a magic m for which the 
strategy s = l m 2 m • • • is optimal. The analysis of composite distributions fur- 
ther says that m must be of form m = B w = YliLo (t) f° r some m &gic w. Let 
c m be the complexity of l m 2 m • • • . A value w = k, i.e. m = n corresponds to the 
exhaustive search of the noise bits. For w = 0, i.e. m = 1, the adversary assumes 
that the noise is 0 every time he receives k queries from the LPN oracle. 

We first computed experimentally the optimal m for the LPNioo,r instance 
where we take 0 < r < \. The magic m takes the value 1 for a r which is 
not close to \. As shown on Fig. 1, it changes to n = 2 100 around the value 
r = 0.4965. This boundary between two different strategies corresponds to the 
value r = \ — ^ computed in our analysis below. Interestingly, there is no 
intermediate optimal m between 1 and n. 
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- 
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T 

Fig. 1 . The change of optimal m for solving LPNioo,r 


For Cryptographic Parameters, c\ is Optimal. The optimal w depends on r. The 
case when r is lower than \ is not interesting as it is likely that no error occurs 
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so all w lead to a complexity which is very close to 1. Conversely, for r = the 
exhaustive search has a complexity of c n = ~(2 fe + l) and w = 0 has a complexity 
of ci = 2 k . Actually, Di is uniform in this case and we know that the optimal m 
completes batches of equal consecutive probabilities. So, the optimal strategy is 
the exhaustive search. 

We now show that for r < 0.16, the best strategy is obtained for w = 0. 

Below, we use ps w = r w (l — r) k ~ w and ci = (1 — r )~ k . 

Let w c be a threshold weight and let a = For 0 < w < w c , due to 

Lemma 26, if cb w is optimal we have 


1 Pr£)(-il s ™ 1 ) Pr^(-il jB ^c) 1 — a 

Pb w Pb w ~ Pb w Pb w 


1 — a 



1 — a 

1 — T 


For r < 0.16, we have < 0.20. So, if a < f we obtain cb w > c\. This 
contradicts that w is optimal. For w c = rfc, the Central Limit Theorem gives us 
that a « \ which is less than |. So, no w such that 0 < w < rk is optimal. 
Now, for w >w c , we have 


b w \ yw 

c - = = YPPi + Pr(-1 B ”) > B Wc Pr(^l B ”c) = (1 - a )B Wc 

f h \ WC 1 

By using the bound B Wc > f J , for w c = rk we have a ~ | and we 

obtain c^ > ^r _r/c . We want to compare this to ci = (1 — r)~ k . We look at 
the variations of the function r i— ► — krlnr — In 2 + k ln(l — r). We can see by 
derivating twice that for r G [0, |], this function increases then decreases. For 
r = 0.16, it is positive. For r = ^, it is also positive. So, for r G [^,0.16], we 
have cb w > ci. 

Therefore, for all r < 0.16, ci is the best complexity so m = 0 is the magic 
value. Experiment shows that this remains true for all r < - — Actually, we 

can easily see that c\ becomes lower than 2 2 ^ 1 for r « ^ We will discuss 

this in Sect. 5. 

Solving LPN with 0{k) Queries. We now concentrate on the m = n case to 
limit the query complexity to 0(k). (In our framework, we need only k queries 
but we would practically need more to check that we did find the correct 
value). So, we estimate the complexity of the full exhaustive search on one 
error vector x of k bits for LPN, i.e., Cd( l n )- If Pt is the probability that 
x is the t-th enumerated vector, we have Cd( l n ) = Ylt=i ^Pt- Fc> r t between 
B w -i + 1 and B w , the sum of the pt s is the probability that we have exactly 
w errors. So, Cd( l n ) < Ylw=o ^wBr[w errors]. We approximate Pr [w errors] to 
the continuous distribution. So, the Hamming weight has a normal distribution, 
with mean kr and standard deviation a = \fkr{ 1 — r). We do the same for 
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B 


w 


2 w-k 

2 k r Vk 
J —°° 


dv. With the change of variables w = kr + £cr, we have 


Cd{ l n ) < ^ B w Pr [w errors 

w=0 

2 k r +oc ( r 

2^7-00 \J- 

II 


Vk _v^_ \ 1 ( w-kr ) 2 

e 2 dv — e 2^2 d w 

■ / CT 


2 /c 

27T 


2 dv dt 


s' 2kr -k-\-2tcr 

- Vk 


The distance between the origin (£, t?) = (0, 0) and the line x = 2kr ^ 2tcr is 


d = Vk- 


1-2 r 


\A + 4r ( 1 -r) 

By rotating the region on which we sum, we obtain 


c ^ n) VIL 


2 +y 2 2 

e dx dy = 


k [ + °° , 2 fc 

. / e ® ax ~ 2 

V2iy Jd dy/2iv 


On Fig. 2 we can see that this approximation of Cd (l n ) is very good for r = k 2 . 

So, the complexity C£>(l n ) is asymptotically 2 /c ( 1_21n2 ) +c,( ^ v/ ^\ Interestingly, 
the dominant part of log 2 Cd( l n ) is 0.2788 x k and does not depend on r as long 
as ^ i. Although very good for the low k that we consider, this approx- 

imation of Cd{ l n ) deviates, probably because of the imprecise approximation 
of the B w ' s. Next, we derive a bound which is much higher but asymptotically 
better (the curves crossing for fc ~ 50 000). We now use the bound B w < k w 
and do the same computation as before. We have 


k 

Cd{ l n ) < ^ k w Pr [w errors] 

w=0 


1 /* + 00 2 

-= / k kT+t<T e~^ dw 

v 27 T J — 00 


(cr ln /c) 2 +/cr ln /c 
a/27T 

(cr ln /c) 2 +/ct ln k 


r + °° (t-alnk) 2 

/ e 2 

J — OO 


dw 


So, Cx>(l n ) = e |v^(infc) 2 +e>(v^in/c) for r = It is better than the 
of Lyubashevsky [16] in the sense that it is asymptotically better and that we 
use 0{k ) queries instead of k 1+£ . However, this new bound for Cd( l n ) is very 
loose. 

Outside the scenario of a sparse LPN, we display in Fig. 3 the logarithmic 
complexity to solve LPN in our STEP game when the noise parameter is constant. 


Logarithmic time complexity Logarithmic time complexity 
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Fig. 2. log 2 (Ci 3 (l n )) vs. log 2 {^p^e ^ j for r = k 2 



Fig. 3. log 2 (Cx>(l n )) for constant r 
Table 1. log 2 (Cx>(l n )) vs. log 2 = e _ ^ for k — 2000 


T 

logmen) 

l0 & (d^ e * ) 

0.1 

1350.04 

1314.81 

0.125 

1458.86 

1429.33 

0.25 

1794.57 

1788.49 

0.4 

1966.67 

1966.55 
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Comparing log 2 (C£>(l n )) with the approximation we obtained, i.e. 
log 2 ^ , we obtain the following results which validate our approxi- 


K dV 2 ^ 

mat ions (See Tablet) 


4.2 Password Recovery 

There are many news nowadays with attacks and leaks of passwords from dif- 
ferent famous companies. From these leaks the community has studied what are 
the worst passwords used by the users. Having in mind these statistics, we are 
interested to see what is the best strategy of an outsider that tries to get access 
to a system having access to a list of users. The goal of the attacker is to hack 
one account. He can try to hack several accounts in parallel. Within our frame- 
work, we compute to see what is the optimal m for the strategy l m 2 m • • • . In 
this given scenario, the strategy corresponds to making m guesses for each user 
until it reaches the end of the list and starting again with new guesses. 

We consider the statistics that we have found for the 10 000 Top Passwords 3 
and the one done for the database with passwords in clear from the RockYou 
hack 4 . Studies on the distribution of user’s passwords were also done in [7,10, 
22,23]. The first case-study analyses what are the top 10 000 passwords from a 
total 6.5 million username-passwords leaked. The most frequent passwords are 
the following: 


password p\ = 0.00493 

123456 p 2 = 0.00400 

12345678 p 3 = 0.00133 

1234 p 4 = 0.00089 


In the case of the RockYou hack, where 32 million of passwords were leaked, 
we have that the most frequent passwords and their probability of usage is: 

123456 pi = 0.009085 

12345 p 2 = 0.002471 

123456789 p 3 = 0.002400 

Password p± = 0.000194 

Moreover, approximately 20 % of the users used the most frequent 5 000 pass- 
words. What these statistics show is that users frequently choose poor and pre- 
dictable passwords. While dictionary attacks are very efficient, we study here 
the case where the attacker wants to minimize the number of trials until he 
gets access to the system, with no pre-computation done. By using our formulas 
of computing Cd( l m 2 m • • • ), we obtain in both of the above distributions that 
m = 1 is the optimal one. This means that the attacker tries for each username 
the most probable password and in average after couple of hundred of users (for 
the two studies we obtain Cd to be « 203 and « 110), he will manage to access 

3 https : / / xat o . net / passwords / more- top- worst- passwords / # . VNiORvnF - xW . 

4 http : / / www . imperva . com / docs / WP -Consumer -Password -Worst -Practices . p df . 
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the system. We note that having m = 1 is very nice as for the typical password 
guessing scenario, we need to have a small m to avoid complications of blocking 
accounts and triggering an alarm that the system is under an attack. 


5 On the Phase Transition 

Given the experience of the previous applications, we can see that for “regular” 
distributions, the optimal m falls from m = n to the minimal m as the bias of 
the distribution increases. We let n\ be such that p 1 = p 2 = • • • = p ni ^ p ni + 1 
and ri 2 be such that p Ul + 1 = • • • = p ni + n2 ^ p ni+n2+ i. Due to Lemma 26, the 
magic value m can only be ni, ni + 77 , 2 , or more. We study here when the curves 
of C D ( l ni 2 ni • • • ), C D (l ni+n2 2 ni+n2 • • • ), and Cu(l n ) = ^±1 cross each other. 


Lemma 28. We consider a composite distribution D\ — olU\ + /3U2 + (1 — a — 
/3)D' , where U\ and U 2 are uniform of support n\ and n 2 - For U uniform, we 
have 


C D { l ni 2 ni •••) < C D ( l’ ll + T,2 2 ni + T, 2 •••) <-=> a-P— > a (a + /3 - — 

n2 V 2 J 

C D ( l ni 2 ni •••) <C V ( l n ) n/ni + 1 > - 

2 a 

Note that for 2~ H °° > -, we have — > - so the second property is satisfied. 

As an example, for m = n 2 = 1, the first condition becomes a — (3 > a 1 2 
which is the case of all the distribution we tried for password recovery. The 
second condition becomes 2~ H °° > which is also always satisfied. 

For LPN, we have n\ = 1, 712 = k, a = (1 — r) k , and f3 = 712t(1 — r) k ~ x . The 
first and second conditions become 


(l-r)*< 


1 -2t 

i k—3 n 


and 


(i — r ) k — 


2 k + 1 


respectively. They are always satisfied unless r is very close to by letting 
r = \ — e with e — > 0, the right-hand term of the first condition is asymptotically 
equivalent to and the left-hand term tends towards 2~ k . The balance is thus 
for r « \ 2 ~ k . The second condition gives 


r <1 — 


(t+± 

V 2 


k 


1 ln 2 / 1 \ 

2 ~^k~°\k) 


So, we can explain the phase transition in LPN/^ as follows: if we make r decrease 
from |, for each fixed m, the complexity of all possible C^(l m ) smoothly 
decrease. The function for m = n\ crosses the one of m = n\ + n 2 before it 
crosses which is close to the value of the one for m = n. So, the curve for 
m = n\ becomes interesting after having beaten the curve for m = n\ +n 2 - This 
proves that we never have a magic m equal to n\ +n 2 - Presumably, it is the case 
for all other curves as well. This explains the abrupt fall from m = n to m = 1 
which we observed on Fig. 1. 
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Proof. We have 


and 


C D ( l ni 2 ni •••) 


C D ( i rai ) 

Pr D (l ni ) 


a ”‘ 2 +1 + (1 — a)ni 

a 


C D ( i ni + n2 2 ni+n2 • • • ) 


C+(l n i+ n2 ) 

Pr jD (l n i+ n 2) 


o ni 2 +1 + P (ni + n2 2 +1 ) + (1 - a - / 3 )(ni + 712) 
o + (3 


C D { l ni ) < Cz?(l ni+n2 ) 


Pr D (l ni ) “ Pr j o(l ni + n2 ) 
+ (1 — a)ni aT * 1 ^ 1 + /? (th + n2 2 ^ 1 ) + (1 — o: — /?)(?ii + ^ 2 ) 


<a 


For the second property, we have 

C D (l ni 2 ni ---)<Cu(l n )< 


OL + (3 

77/1 

<a — (3 — > <a ( <a + f3 
n 2 


1 - ni/n 2 


< Cu(l n ) 


C D ( l ni ) 

PrD(l ni ) 

+ (1 - <a)ni ^ n + 1 


< 


a 

n/ni + 11 
2 “ a 


□ 


6 Conclusions 

Our framework enables the analysis of different strategies to sequentialize algo- 
rithms when the objective is to make one succeed as soon as possible. 

When the algorithms have the same distribution and are unlimited in num- 
ber, the optimal strategy is of form i m 2 m • • • for some magic m. As the distri- 
bution becomes biased, we observe a phase transition from the regular single- 
algorithm run l n (i.e., m = n) to the single-step multiple algorithms 123 •• • 
(i.e., m — 1) which is very abrupt in the application we considered: LPN and 
password recovery. 

The phase transition phenomenon is further studied. In particular, we show 
that the fall from m = n to m = 1 does not go through any m G {2, . . . , k ^ k ^ }. 

For LPN, the solving algorithm we obtain outperforms the classical ones. 

When we have a limited number of algorithms, the optimal strategy has 
the form l mi • • • \D\ rni l rn ‘ 2 • • • \D\ m2 • • • . For LPN, this simple algorithm outper- 
forms the classical ones, even the one from Asiacrypt 2014 [12] for the relevant 
parameters using r ~ k~ 2. 


728 S. Bogos and S. Vaudenay 


A Composite Distributions 


We give a formula to compute the optimal strategies for distributions obtained by 
composing several distributions. The formula is useful when we want to regroup 
equal consecutive p/s in a distribution D\ so that D\ appears as a composition 
of uniform distributions. 


Lemma 29. Let Ui, ... ,Uk be independent distributions of support ni, . . . , nk, 
respectively. Let Ui = (p^i, • • • Given a distribution (aq, . . . , ajf) 

of support k, we define Di = ol\U\ + OL 2 U 2 + • • • + OikfJk by Di = 
(aqpiq, . . . , aqpi ,ni 1 &2P2,1 1 • • • 5 &kPk,rik ) * 

Let m = Y^j = 1 n j • have 


Pr(l ni l n2 • • • l Ui ) pai + ... + «* 

£>1 

z z 

(l" 1 r=...i^) = E (r* ) + $>,- 

J=1 J=1 



<^/c 


We note that if all Ui are ordered and if c^p* ?ni > cq+ip^+iq for all 1 < i < fc, 
then TL is ordered as well. 

We let D = (Di,Di,...). If we assume that Ui are uniform distributions, 
we can use the observation following Lemma 26 to deduce from Theorem 17 that 
the optimal strategy is l m 2 m • • • for m = Y^j = 1 n j an d ^ minimizing 


minC jD (0) = : 


'T! j= 1 + EJ=i ( x _ Dk=i «*) 


X,=i a i 


Proof We prove it by induction on i. It is trivial for i = 0. We assume the result 
holds for i — 1. By induction, we have 


C Dl (l ni = C Dl (l ni + (1 - Pr(l ni •••l ni - 1 ))Ci 3l (l n *h(l ni 


i=i 


j=i 


i — 1 i — 1 / J 

— yy o'jCuj (i nj ) + j 1 — yy oik ] + a iCui (i n T + ni ( 1 — yy 

fc=l 


/c = l 


= X! "j'Vi t 1 "’ ) + X n i 1 - X ak 

3 = 1 j = l \ fc=l 

The second equality is obtained from the fact that 


C Dl (l ni h(l ni •••! n *- 1 )) =* 


/ | 0 | | \ 1 ( a i + 1 + ■ ■ ■ + Q!fc v 

(Pi,l + 2p;, 2 + • • • + niPi.nJ + n i\— , , ) 


CKi + • • • + O-k 


1 — Pri), (l n i 


CKi + • • • + «/e 

^ 1 ,1-Pr 0l (l ni 

C ^ (1 )+Hi( } 


□ 
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B Proof of Lemma 25 

Proof. We will show below that there exists d > 0 such that a < b — d and 
Cd{s) = CD{uj d i a vj b ~ d w). Hence, we can rewrite s by replacing u by uj d and b 
by b — d. Since d > 0 and a < b — d, we can just apply this rewriting rule enough 
time until b is lowered down to a. Hence, we obtain the result. 

To find d, we first write s = u^i^uii 1712 • • • i rnr u r i a vj b w where i appears in no 
u t , the m t are nonzero, and ui, ... ,u r are non-empty. (Note that since a < 6, we 

must have m i H \-m r > 0 so r > 1.) Let n' be the equal number of occurrences 

of i and j in ui a vj b . Let t be the smallest index such that mi -| \-m t > n' — b. 

(For t = 0, the left-hand term is 0 but n' > 6; for t = r, the left-hand term is 
m! — a and we know that a < 6; so, t exists and t > 0.) We write m t = m' -\-d such 
that m\ + • • • + m t -i + m! = n' — b. So, d > 0. Note that b — d = b — m t + m' = 
m! — mi — • • • — m t = m t+ i + • • • + m r + a. So, b — d > a. Clearly, d < b. 
We write s = Hi d Bi a vj d T with head H = uoi mi uii 1712 • • • u t -ii 171 , body B = 
mi 171 ^ 1 • • • i mr u r , and tail T = j b ~ d w. Clearly, H has n' — b occurrences of i and 
Hi d Bi a v has n' — b occurrences of j. Since s is optimal for D, i d Bi a vj d is optimal 
for D\-iH. We note that B does not start with i (t is between 1 and r and u t is 
nonempty and with no i) and that i a v is non-empty and with no j (either a^O 
or v is nonempty and with no j ). We split i d Bi a vj d = i d x i • • 'Xf>i a yi • • • j d 
where two consecutive blocks in the list i d ,x i, . . . , X£, i a , yi 7 . . . , y£^j d have no 
key in common. (For a = 0, we can always split so that X£ and yi have no key 
in common by using the first term k of v which is not the last of u: we just take 
yi as a block of fc’s and X£ as a block with no k.) We can apply Lemma 24 and 
obtain 

C D (i d \^'- h ) < C D (i a |V-«) < C D (y 1 \-> • • • ) < Cflfeh-) < C D (j d rt n '- b ) 

Pr D (i d \^i n '- b ) ~ Pr D (i a \^i n, ~ a ) ~ Pr D (y 1 \-> • • •) ~ Pr D (y £ ,\^---) ~ Pr D (j d \^j n '~ b ) 

Since the first and the last terms are equal, all of them are equal. So, we can 
permute two consecutive blocks which have no index in common. Hence, we can 
propagate j d earlier until it is stepped before i a , since we know there is no other 
occurrence of j in the exchanged blocks. We obtain that 

C D (Hi d Bi a vj d T ) = C D (Hi d Bj d i a vT) 


as announced. □ 
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Abstract. We consider the task of data analysis with pure differen- 
tial privacy. We construct new and improved mechanisms for statistical 
release of interval and rectangle queries. We also obtain a new algorithm 
for counting over a data stream under continual observation, whose error 
has optimal dependence on the data stream’s length. 

A central ingredient in all of these result is a differentially private par- 
tition mechanism. Given set of data items drawn from a large universe, 
this mechanism outputs a partition of the universe into a small number 
of segments, each of which contain only a few of the data items. 


1 Introduction 

Differential privacy is a recent privacy guarantee tailored to the problem of 
statistical disclosure control: how to publicly release statistical information about 
a set of people without compromising the privacy of any individual [DMNS06] 
(see the book [DR14] for an extensive treatment). In a nutshell, differential 
privacy requires that the probability distribution on the published results of an 
analysis is “essentially the same,” independent of whether any individual opts 
in to, or opts out of, the data set. (The probabilities are over the coin flips of 
the privacy mechanism.) Statistical databases are frequently created to achieve 
a social goal, and increased participation in the databases permits more accurate 
analyses. The differential privacy guarantee supports the social goal by assuring 
each individual that she incurs little risk by joining the database: anything that 
can happen is essentially equally likely to do so whether she joins or abstains. 

In the differential privacy literature, privacy is achieved by the introduction of 
randomized noise into the output of an analysis. Moreover, sophisticated mech- 
anisms for differentially private data analysis can incur a significant efficiency 
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overhead. A rich and growing literature aims to minimize the “cost of privacy” 
in terms of the error and also in terms of computational efficiency. In this work 
we present new algorithms with improved error for several natural data analysis 
tasks. 

There are several variants of differential privacy that have been studied. 
Most notably, these include the stronger (in terms of privacy-protection) notion 
of pure differential privacy, and its relaxation to approximate differential privacy. 
Our work focuses on mechanisms that guarantee pure differential privacy for the 
tasks of answering statistical queries, maintaining an online count of significant 
events in a data stream, and partitioning a large universe into a small number 
of contiguous segments, none of which contains too many input items (a type of 
“dimension reduction”). 

Before proceeding to outline our contributions, we recall the definition of 
differential privacy: 

Definition 1.1 (Differential Privacy [DMNS06,DKM+06]). A randomized 
algorithm M : U n — > Y is (e, 5 ) -differentially private if for every pair adjacent 
databases x,x' that differ only in one row, and for every S C Y: 

Pr[M(x) GS]<e £ - Pr[M(a/) G S] + 5. 

When 5 = 0, we say the algorithm provides (pure) 5 -differential privacy. When 
S > 0, we say that the algorithm provides (approximate) differential privacy. 

As discussed above, we focus on the stronger guarantee of pure differential pri- 
vacy throughout this work. 


1.1 Differentially Private Query Release: Interval and Rectangle 
Queries 

Differentially private query release is a central problem in the literature. The 
goal is releasing the answers to a set of statistical queries while maintaining 
both differential privacy and low error. We focus on the case of counting queries 
(sometimes referred to as statistical queries). Let U be the set of possible data 
items (the data universe). A counting query q is specified by a predicate q : 
U — > {0, 1}. For an n-element database x G U n , the query output q(x) G [0,n] 
counts how many items in the database satisfy the query. The goal, given a set 
Q of queries and a database x, is to approximate q{x) for each q G Q, while 
(i) guaranteeing differential privacy (for the collection of all answers), and (ii) 
minimizing error in the answers. 

We focus on the (challenging) setting where the query set Q is large. To 
avoid running in time proportional to \Q\ (which is too large), we will produce 
a differentially private data synopsis. Given the database x , the mechanism pro- 
duces a synopsis: a data structure that can later be used to answer any query 
q G Q. Thus, the synopsis is a small implicit representation for the answers to 
all queries in Q. 
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Differentially private query release, especially for counting queries, has been 
the focus of a rich literature. Starting with the works of Dinur, Dwork and Nissim 
[DN03,DN04], showed how to answer k queries (counting queries or general low- 
sensitivity queries) using computationally efficient mechanisms, with noise that 
grew with k for pure e-DP [DMNS06], or yfk for approximate ( 5 , 5)-DP. Starting 
with the work of Blum, Ligett and Roth [BLR08], later works improved the 
dependence on the number of queries k to logarithmic. The running time for 
these mechanisms, however, can be prohibitive in many settings. Even the state- 
of-the-art mechanisms for answering general counting queries [HR10] require 
running time that is at least linear in the size of the data universe \U\ (whereas 
the running time of earlier mechanisms was logarithmic in \U\). Indeed, for many 
query sets Q, the best differentially private query release mechanisms that are 
known require either large error (as a function of |Q|), or large running time (as 
a function of \U\ and |Q|). Indeed, under cryptographic assumptions, there are 
inherent limits on the computational efficiency and the accuracy of differentially 
private query release algorithms for specific sets of counting queries [DNR+09, 
U1113,BUV14]. Thus, a significant research effort has aimed to design efficient 
and accurate DP mechanisms for specific natural sets of counting queries. 

Our work continues this effort. We construct new and improved mechanisms 
for answering interval or threshold queries. We further extend these results to 
multi-dimensional rectangle queries, and for these queries we are able to increase 
the data dimensionality with relatively mild loss in accuracy and efficiency. 

Interval Queries. We consider the natural class of interval queries. Here the data 
universe is the integers from 1 to D (i.e. U = [1,14], and \U\ = D). 1 Each query 
q is specified by an interval I = [i,j] C [1, Z4], and associated with the predicate 
that outputs 1 on data elements that fall in that interval. Usually we think of D 
as being very large, much larger than (even exponential in) the database size n. 
For example, the data universe could represent a company’s salary information, 
and interval queries approximate the number of employees whose salaries fall in 
a certain bracket. 

In prior work, Dwork et al. [DNPR10] showed that this class could be 
answered with pure e- differential privacy and error roughly O( log £ D ) (see the 
analysis in [CSS11]). They also showed an i?(log D) error lower bound for obtain- 
ing pure differential privacy. Our first contribution is a new mechanism that 
obtains pure differential privacy with error roughly Q( logD +d Q g n ) ). j n particu- 
lar, the error’s dependence on D is optimal. 

Theorem 1.2 (DP Intervals). The mechanism in Sect. 3.2 answers interval 
queries over [1 , D]. For any privacy and accuracy parameters s,/3 > 0, it guar- 
antees (pure) e- differential privacy. For any database x of size n, with all but (3 
probability over the mechanism’s coins , it produces a synopsis that answers all 


1 Throughout this work, for integers z, j s.t. i < j, we use the notation [z, j] to denote 
the (closed) interval of integers {z, i + 1, . . . , j — 1, j}. 
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interval queries (simultaneously) with error 0( log D+ ^ log ^ n )’ 1 °g( 1 // 3 )) ) t T/m run _ 
ning time to produce the synopsis (and to then answer any interval query) is 
(n • poly (log D, log(l/e), log(l//3))). 

While the error’s dependence on log D is optimal, we do not know whether the 
dependence on log 2 n is optimal (i.e., whether the error is tight for cases where 
D is not much larger than n). This remains a fascinating question for future 
work. 

The main idea behind this mechanism is partitioning the data universe [1, D] 
into at most n contiguous segments, where the number of items in each segment 
is not too large. We give a new differentially private mechanism for construct- 
ing such a partition, see Sects. 1.3 and 2. Given this partition, we treat the n 
segments as a new smaller data universe, and use the algorithm of [DNPR10] to 
answer interval queries on this smaller data universe (this is where we incur the 
log 2 n error term) . 

Related Work: Approximately Private Threshold Queries. The class of interval 
queries generalizes the class of threshold queries, where each query is specified 
by i £ [1, D] and counts how many items in the input database are larger than i 
(i.e., how many items are in the interval [z, D]). In fact, since answers to threshold 
queries can also be used to answer interval queries, these two classes are equiva- 
lent. Answering threshold queries with approximate (e, £)-DP was considered in 
the work of Beimel, Nissim and Stenner [BNS13], who obtain an upper bound 
of 2°( log In a beautiful recent independent work, Bun, Nissim, Stemmer 
and Vadhan [BNSV15] show a lower bound of i?(log * D) for approximate-DP 
mechanisms (as well as an improved upper bound of roughly 2 log D ). The main 
difference with our work is that we focus on the stricter guarantee of pure dif- 
ferential privacy, which (provably) incurs a larger error. 

Rectangle Queries. We further study a natural generalization of interval queries: 
rectangle queries. These queries consider multi-dimensional data (in particular, 
c-dimentional for an integer c > 1). The data universe is U = [1, D] c . A rectangle 
query q is specified by a rectangle R = ( [z 1 , j 1 ] x ... x [z c ,j c ]) C [1 , D] c , and 
associated with the predicate that outputs 1 on data items that fall inside the 
set R. As was the case for interval queries, we usually think of D as larger than 
rz, and of c as being smaller than either of these quantities (sub-logarithmic in 
n, or even constant). Continuing the example above, a database could contain 
employees’ salaries, ages, years of experience, rank, etc. Rectangle queries can be 
used to approximate the numbers of employees that fall into various conjunctions 
of brackets, e.g. the number of employees in given age, experience and salary 
brackets. More generally, these queries are useful for multi-dimensional data, 
where many (or all) of the data dimensions are associated with an ordering on 
data items in that dimension. 

We generalize the intervals mechanism to answer rectangle queries. While 
in many settings known differentially private algorithms suffer from a “curse of 
dimensionality” that increases the error or running time as the dimension grows, 
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we give an algorithm whose error and running time have a mild dependence 
on the data dimensionality. In particular, the error is roughly 0((c 2 • log D) + 
((logn)°( c )). The running time is roughly n • poly (log c n, log D), and does not 
grow with D c . For the (reasonable) setting of parameters where we think of 
n ~ logD, the running time is only polynomial in (loglogD) c . 

Theorem 1.3 (DP Rectangles). The mechanism described in Sect. 3.3 
answers c-dimensional rectangle queries over [1,D] C . For any privacy and accu- 
racy parameters e,/3 > 0, the mechanism guarantees (pure) e- differential pri- 
vacy. With all but f3 probability over its coins, all rectangle queries (simul- 
taneously) are answered with error 0( ^ c • 1 °g D )+(( lQ g n ) ( ) - 1 °g( 1 /h) ^ jy&e run- 
ning time to produce the synopsis (and to then answer any rectangle query) 
is (n • poly(log c n, log D, log(l/ e), log(l/ /?))). 

In prior work, Chan Shi and Song [CSS 11] considered rectangle queries and 
obtained an error bound of roughly (logD)°( c ). Theorem 1.3 roughly replaces 
this with a (logn)°( c ) term, as well as an additive 0(c 2 • logD) (recall that 
typically n « D). We emphasize that the error’s dependence on logD does not 
grow exponentially with the dimentionality c. 

Muthukrishnan and Nikolov [MN12] show an ^((logn) 0-0 ^ 1 ^) error lower- 
bound when n « D, even for (the relaxed notion of) (e, S) -differentially private 
algorithms (they refer to this as “orthogonal range counting”). Thus, the depen- 
dance on logn in the mechanism of Theorem 1.3 is optimal up to a (small) 
polynomial factor (the exact term in our upper bound is 0((logn) 1,5c+1 )). 

The rectangles mechanism is a multi-dimensional generalization of the inter- 
vals mechanism (see more above and below). Recall that the intervals algorithm 
utilized a differentially private partition of the data universe into n segments. It 
then used the “tree-counter” algorithm of [DNPR10] to answer interval queries 
over these n segments. This is done by building a binary tree of noisy counts, 
whose leaves are the n segments. For the rectangle mechanism, we use a (fc, d)- 
tree-like data structure (see [Ben75] and see also the rectangle mechanism of 
[CSS11]), building a “tree of trees” of noisy counts along the c dimensions of 
the data universe (after reducing the size of each dimension using a differentially 
private partition). We judiciously prune this tree to avoid an exponential blowup 
in its size (the naive implementation requires time and memory n c ). A careful 
analysis guarantees that even while we extend to c dimensions, the error (as a 
function of D ) only grows to 0(c 2 • D ). 


1.2 Counting Under Continual Observation 

Dwork, Naor, Pitassi and Rothblum [DNPR10] introduced the problem of count- 
ing under continual observation. The goal is to monitor a stream of D bits, and 
continually maintain an approximation of the number of l’s that have been 
observed so far. For privacy, the entire collection of D outputs (where the i- 
th output approximates the count after processing i elements) should maintain 
^-differential privacy, masking the value of any single bit. The canonical appli- 
cation is monitoring events, such as the number of influenza patients arriving 
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at medical office, or the number of users visiting a webpage (where privacy 
hides any single access). Since its introduction online counting has found many 
applications. In most settings, the data stream is sparse : the number of l’s (the 
stream’s “weight”) is much smaller than D. 

The online counter proposed by [DNPR10] (we refer to this as the “tree 
counter”) had error roughly 0( log 2 D) (see the analysis in [CSS 11]). As an addi- 
tional contribution, we present an improved counter (with pure or (e, 0) dif- 
ferential privacy) for sparse streams. In particular, thinking of the input as a 
boolean string where the number of l’s is at most n (and n D), the error is 
improved to roughly (logD + (log 2 n)) (compared with roughly (log 2 D) for the 
tree counter). We note that the dependence on D is optimal, and matches the 
f2(logD) lower bound in [DNPR10]. 

Theorem 1.4. For any £,/? > 0, the online counter from Sect. 3.1 guarantees e- 
differential privacy. Taking n to be an upper bound on the input stream ’s weight, 
with all but (3 probability over the counter’s coins, the maximal error over all D 
items is at most 0( lo s D+ d lQ g j 1 )' 1 QgCVff)) ^ . 

Here again, we partition the data stream (of length D ) into at most n segments, 
where the number of items in each segment is not too large. This is done using 
an online partition mechanism, which can process the items one-by-one, and 
after processing each item can decide whether a segment is large enough to be 
“sealed”, or whether to keep accumulating the current segment (see Sects. 1.3 
and 2). Given this online partition mechanism, we can run the tree counter of 
[DNPR10] (or any other counter) on its output. As we process data items, we 
don’t update the count until the current segment is sealed. When a segment is 
sealed, we feed the count within this segment into the tree counter, and obtain 
an updated count (we use here the fact that the tree counter can also operate 
on integer inputs, not just on bits). 


1.3 Differentially Private Online Partition 

As mentioned above, one of the main tools we use is a (pure) ^-differentially 
private partition algorithm. Given an n-item database x C [1, D\, this algorithm 
partitions the data universe U = [1 ,D\ into (at most) n contiguous segments 
(Si = [l,si],S 2 = [si + 1, S 2 ], . . . , S n = [s n _i + 1,.D]) (where the sf s are all 
integers). The guarantee is that w.h.p. the number of data elements in each 
of these segments is small, and bounded by roughly O(logD). These partitions 
are pervasive in the applications mentioned above. In a nutshell, we treat the 
segments as a new and reduced data universe. This reduces the size of the data 
universe from D to n, an exponential improvement for some of the parameter 
regimes of interest. Beyond its applications in this work, we find the partition 
mechanism to be of independent interest, and hope that it will find further 
applications. 

Theorem 1.5. For any e,/3 > 0 ; the Partition mechanism in Sect. 2 guar- 
antees e- differential privacy. When run on a database of size n, with all 
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but /3 probability over the mechanism’s coins , it outputs at most n segments, 
and each segment is of weight at most 5 ( lQ s D +^ Q g( 1 // 3 )) . xhe running time is 
n • poly(log D, log(l/ e), log(l/ /?)). 

The Partition algorithm and its analysis are inspired by an algorithm from 
[DNPR10] for transforming a class of streaming algorithms into ones that are 
private even under continual observation. 

Another important property of this algorithm is that it can be run in an 
online manner. In this setting, the input is treated as a bit-stream of length D. 
The i-th input y; t G {0, 1} indicates whether item i is in the dataset. Thus, this 
is a sparse stream with total weight n. The partition mechanism can process 
these bits one-by-one, making an online decision about when to “seal” each 
segment. We use this online of the partition algorithm to obtain an improved 
online counter. 

2 Differentially Private Online Partition 

The Mechanism. The (online) partition algorithm processes the input as a 
stream x \, . . . ,xd G {0, 1}. We use n to denote the weight of the stream (the 
number of l’s). 2 The output is a partition of [D] into (contiguous) segments 
P = (S i, . . . , Sj), such that: 

1. W.h.p. the number of segments j is smaller than n. 

2. The weight of the items in each segments is 0((logP + log(l / fl))/e) (where 
6 is the privacy parameter). 

This is an online algorithm, in the sense that after processing the i-th data 
item, the algorithm either “seals” a new segment, ending at i, or it keeps the 
current segment “open” and proceeds to the next data item. We emphasize that 
the algorithm is oblivious to the input stream’s weight. The Partition algorithm 
and its analysis are inspired by an algorithm from [DNPR10] for transforming 
a class of streaming algorithms into ones that are private even under continual 
observation (see also the discussion of the “sparse vector” abstraction in [DR14]). 

Theorem 2.1. For any e,/3 > 0, the Partition Algorithm of Fig. 1 guarantees 
e- differential privacy. Let n be the total weight of the input stream. With all but 
(3 probability over the algorithm’s coins, it outputs at most n segments, and each 
segment is of weight at most 5 ( lQ g D +^ lQ g( 1 // 3 )) # 

Before proving the partition algorithm’s privacy and accuracy, we remark that 
the dependence on logP is optimal by the lower bound of [DNPR10]. Moreover, 
for an offline implementation, where the input is given as an n-item database 
x C [1,D], we can reduce the running time to polylogP: 


2 More generally, we could also work with a stream of integers, and the weight would 

be the L\ norm. 
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Partition (D,e,(3) 

Initialize the threshold T (3(logD + log(l /(3))/e), and indices i, j <— 0 
Repeat the following loop: (each iteration of the loop seals a new segment) 

1. Initialize the j-th segment: 

j <— j + 1, count j 0 , Tj <— T+ Lap(l/e) 

2. Repeat the following loop, processing the z-th data item in each iteration: 

(a) i <— i + 1 , count j count j + Xi 

(b) count i count j + Lap{ 1/e) 

Keep the j-the segment open until {count i > Tj ) or (i > D ) 

3. Seal the j-th segment: Sj i 

Until (i > D). Take m j to be the final number of segments 

Output the partition P = {[1, si], [s i + 1, S 2 ], • • • , [s m - 1 + 1, D]} and the number of segments m 


Fig. 1. Online DP partition algorithm 


Remark 2.2 [Efficient Offline Implementation]. For the offline settings, where 
the input is an n-item database x C [1 ,D\, we can compute the partition in 
n • polylog(D) time as follows. We sort the n items so that x\ < < . . . < x n 

(where each x^ G [1, D]). We then process the items one by one. When processing 
the k- th item Xk, assume that the last sealed segment was sealed at Sj. We count 
the number of database items in [sj + This gives a certain probability p 

that the (j + l)-th segment will be sealed at x^. Until the (j + l)-th segment is 
sealed, for every y G [xk,Xk+i — 1], the probability that the (j + l)-th segment 
is sealed at y remains pk (because there are no additional items processed). 
We can now sample in polylogD time whether the segment is sealed in the 
range [xk,Xk+i — 1]. If we sample that the segment is sealed at some y* in this 
range, then we run the above process again starting at y (with a new probability 
computed from the updated true count, which becomes 0). If not, then we run it 
again starting at Xk+i (again from the updated the count, which is incremented). 

Proof (Proof of Theorem 2.1). We argue privacy and accuracy: 

Privacy. Fix databases x,x\ which differ in the i-th data item (for i G [D]). 
Consider a partition P. Take Sj G P s.t. i G Sj. Since the data streams are 
identical up to Sj, the probabilities of generating the prefix Si, . . . , Sj-i are 
identical on x and x' (for any choice of random coins made in the first j — 1 
segments, the outcome on both databases is identical). Below, we bound the 
ratio between the probabilities of generating Sj = [sj~i, Sj ] as the j-th segment 
in both runs. After generating Sj as the j-th segment, the probabilities of the 
partition’s suffix when running on the two databases are again identical, because 
the data are identical and no state is carried over (beyond the boundary Sj of 
the j-th segment). 

We show a bijection between noise values when running on x and on x f , such 
that for any noise value producing Sj on x, the bijection gives a noise value of 
similar probability that produces the same output on x' . We conclude that the 
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probability p' of producing Sj on x' is not much smaller than the probability p 
of producing Sj on x, which implies Differential Privacy. 

Towards this, take Tj , Tj be the j- th noisy thresholds in a run on x and on 

x' (respectively), and similarly take count s . and count s . to be the noisy counts 
in runs on x and on x' . The bijection is defined as follows: 

- For the case Xi = 0 and x[ = 1, take: 

Tj — Tj + 1, count s . = count Sj 

All other noise values are unchanged in the two runs. This bijection guarantees 
that if no item before Sj sealed the j- th segment on x, then no item before Sj 
will seal the j-th segment on x' (whose count can only be larger by at most 
1 at any point in the segment). Moreover, if Sj seals the j- th segment on x, 
then it will also seal the j- th segment on x' (because the noisy threshold there 
is larger by 1, and count at Sj is larger by 1 in x'). 

- For the case Xi = 1 and x[ = 0, take: 

Tj = Tj, count s . = count Sj + 1 

All other noise values are unchanged in the two runs. This bijection guarantees 
that if no item before Sj sealed the j- th segment on x, then no item before 
Sj will seal the j- th segment on x' (whose count can only be smaller at any 
point in the segment). Moreover, if Sj seals the j- th segment on x, then it will 
also seal the j-th segment on x' (because the noisy threshold there is smaller 
by 1, and count at Sj is also smaller by 1 in x'). 

Since the bijection changed the magnitude of a single draw from Lap(l/s) 
by at most 1, we conclude that p' > e~ £ • p, and the algorithm is ^-differentially 
private. 

Accuracy. By construction, the algorithm makes at most 2D draws from the 
Lap( 1/e) distribution. By the properties of the Laplace distribution, with all 
but [3 probability, all of these draws will have magnitude at most ((logT + 
log 2 + \og(l / (3)) / e) . Condition on this event for the remainder of the proof. 
Under this conditioning, whenever count > T, we have that count (the true 
count within the segment) is greater than 0, and so all the segments are non- 
empty, and there can be at most n segments (because there are only n items 
in the dataset). Moreover, under the above conditioning, as soon as we have 
count > 5(log D + \og(l / (3)) / s) , we also have count > T, and so no segment can 
have weight larger than 5 (log D + log(l / (3))/ e). 

3 From Partitions to Counting, Intervals and Rectangles 

In this section we apply the partition algorithm to obtain improved differentially 
private mechanisms for online counting, and for answering interval and rectangle 
queries. 
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3.1 Online Counting Under Continual Observation 

Counting under continual observation was first studied by [DNPR10], and has 
emerged as an important primitive with many applications. Given a stream of 
D data items (integers or boolean values), the goal is to process the items one- 
by-one. After processing the i-th item, the counter outputs an approximation 
to the sum of items (1 . . . i). Taken together, the counter’s D outputs should be 
differentially private, and mask a change of 1 in any particular data item (flipping 
a bit if the values are boolean, or adding/subtracting 1 if they are integers). 
A (D, a, /3) -counter guarantees that with all but /? probability over its own coins, 
all D estimates it outputs (simultaneously) have error bounded by a. 

Recap: The “Tree Counter For privacy and error parameters £, /? > 0, 
“tree counter” of [DNPR10] is an ^-differentially private (D, 0(((log 2 D) • 
log(l //3))/s),/3) counter: W.h.p., for all D outputs simultaneously, the error is 
bounded by roughly (log 2 D ). The counter works by building a binary tree over 
the interval [1, D]. Each data item is a leaf in the tree, and each internal node at 
height £ (where leaves are at height 0) “covers” a sub-segment of length 2 l . The 
(. D/2 £ ) nodes in height £ partition the interval [1, D] into sub-segments of length 
2 £ . The online counter maintains a noisy sum for the items in each internal node 
(filling up these counts as the items (1, . . . , D) are processed). To estimate the 
number of items in some segment [1, fc], they observe that the segment is exactly 
covered by at most log D internal nodes of the tree. The counter outputs the sum 
of these internal nodes as its estimate. The noise for each internal node is drawn 
from a Laplace distribution with magnitude 0(\ogD/e), so the sum of noises 
from the logD noise values is 0(log 2 D ) w.h.p. (the error analysis in [DNPR10] 
is a bit more slack, see [CSS11]). Privacy follows because any “leaf” (i.e. input 
element) only affects the counts of the logD internal nodes that “cover” it. 

Improved Online Counter via Partitions. We show that the (online) partition 
algorithm of Sect. 2 gives an improved online counter (with pure or (e, 0) Differ- 
ential Privacy) for the case of sparse streams. In particular, thinking of the input 
as a boolean string where the number of l’s is at most n (and n « D ), the 
error is improved to roughly (logD + (log 2 n)) (compared with roughly (log 2 D ) 
for the tree counter). We note that the dependence on D is optimal, and matches 
the 12 (log D) lower bound in [DNPR10]. We note that the counter was conceived 
for (and is usually applied to) scenarios where D is much larger than n. 

The improved counter operates by running any online counter (and in partic- 
ular the tree counter) “on top of” a partition obtained from the (online) partition 
algorithm. Initializing the count to 0, we process each new data item using the 
partition algorithm. If the algorithm keeps the current segment open, then we 
simply maintain the current count. If the algorithm seals a segment, then we 
“feed” that segment into the (online) counter as a new data item (using the true 
number of l’s in the current segment). We then update the current count using 
the counter’s output. I.e. the segments of the partition now form the “leaves” 


Pure Differential Privacy for Rectangle Queries via Private Partitions 745 


of the tree used in the [DNPR10] online counter. 3 By differential privacy of the 
partition algorithm and the counter, the output of this composed algorithm is 
also differentially private. 

Theorem 3.1 Composing the Partition algorithm from Fig. 1 with the online 
tree counter from [DNPR10] gives an online counter. For any e,/3 > 0, the 
composed algorithm guarantees £- differential privacy. Let n be an upper bound 
on the input stream’s weight. With all but j3 probability over the counter’s coins, 
the maximal error over all D items is at most 0( log D+ d lQ g n)-iog(i//3)) ^ _ 

Proof. We run the partition algorithm with privacy parameter (e/2) and error 
parameter (/?/ 2). By Theorem 2.1, with all but (/?/ 2) probability, the online 
partition algorithm seals at most n segments, where the (true) number of l’s in 
each segment is at most . We then run the tree counter on this 

“stream” of n segments, with privacy parameter (e/2) and error parameter (/?/ 2). 
The partition into segments is (e/2)-DP, and the output of the tree counter on 
the “stream” of n segments (given the true count in each of these segments) is 
also (e:/2)-DP. By composition of DP mechanisms, the complete output of the 
composed mechanism is e-DP. For accuracy: 

1. By the error guarantee of the tree counter, the n counts obtained when seg- 
ments are sealed have error at most Q( ( log n )^ 1 °g( 1 // 3 ) ^ ( w Rh all but a (j3/ 2) 
probability of error). 

2. By the segment-size guarantee of the partition algorithm, the true count 
in a “open” segment that hasn’t been sealed yet is bounded. Thus, the 
fact that counts are not updated before a segment is sealed incurs only a 
0( iog£>+iog(i//3) ^ additional (additive) error for the (D — n) items that do not 
“seal” a segment. 

By a union bound, with all but (3 probability, the total error is 

log £>+((log 2 n)-log(l/ (3)) ^ 

3.2 Interval Queries 

To answer interval queries on a database x C [1 ,D\, we run the partition algo- 
rithm and obtain a privacy-preserving partition of [1, D] into (at most) n disjoint 
segments (Si, . . . , S n ), where w.h.p. the count of items in each segment is small. 
We then construct a binary tree “on top of” these n segments, as in the improved 
online counter (see Sect. 3.1). I.e., the n segments are the tree’s leaves, and each 
internal node at height h “covers” 2 h segments. For each node in this tree, cov- 
ering an interval [i, j], we add independent Laplace noise of magnitude (log n/e), 
and release the (noisy) size of the intersection xH [i,j] (the number of l’s in the 
interval). Privacy follows because the partition is DP, and given the partition 

3 We note that, in general, we could compose any online counter with the partition 
algorithm. We are not using any specific properties of the tree counter. 
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any data item only changes the counts in log n nodes of the tree. Note that this 
offline algorithm can be implemented in time poly (n, log D) (see Remark 2.2). 

Given the tree of noisy counts, we can answer any interval query I = [i,j\ 
as follows. First, observe that any such interval can be “covered” by at most 
21ogn nodes of the tree: a collection of nodes whose (disjoint) leaves form the 
(minimal) collection of segments whose union contains 1. To find such a cover, 
consider the lowest node k in the tree such that the segments its sub-tree cover 
the interval I (but this is not true for either of fc’s children). Now the “left” 
and “right” parts of the interval I are the parts contained in the left or right 
sub-trees of k (respectively). The left part of I is covered by at most logn nodes 
in the left sub-tree, and the right part of / is covered by at most logn nodes 
in the right sub-tree. Note that we can also find this cover efficiently. Once the 
above cover is obtains, we answer the query by simply outputting the sum of 
(noisy) counts of the nodes that cover the interval. Accuracy follows by the fact 
that the counts in each segment are small, and the noise in the sum of noisy 
counts is also small. 

Theorem 3.2 (Theorem 1.2, Restated). The mechanism described above 
answers interval queries. For any privacy and accuracy parameters e,/3 > 0, 
the mechanism guarantees e- differential privacy. For any database x of size n, 
with all but (3 probability over the mechanism’s coins, all interval queries (simul- 
taneously) are answered with error 0 ( log D+ (( log rc)-iog(i.//3)) ^ . j^g running time 
to produce the synopsis (which can later be used to answer any interval query) 
is poly(n, log D, log(l/e), log(l//3)) . 

Proof. We run the partition algorithm with privacy parameter {e/2) and error 
parameter (/?/ 2). By Theorem 2.1, with all but {(3/ 2) probability, the online 
partition algorithm outputs at most n segments, where the (true) number of 
Ps in each segment is at most 10 ( lo s D +i°s( 1 2 /P )) . \y e then build a tree of noisy 
counts on top of these n segments, adding Laplace noise of magnitude (2 log n/e) 
to each node’s true count, and releasing all of these noisy counts. The partition 
itself is (e/2)-DP, and since each data item affects exactly logn counts in the 
tree, these noisy counts (taken all together and as a function of the partition) 
are (e/2)-DP. Thus, the algorithm’s output is altogether e-DP.For an interval 
query I = [i,j], we argue accuracy as follows: 

The algorithm finds a “minimal cover”: A collection of segments (S*., . . . , Si) 
s.t. the union of these segments contains the interval /, and (by minimality) the 
union of {Sj, c +i, . . . , Si- 1 ) is contained in I (we ignore the borderline cases where 
l-k< 1 , which is handled similarly). Let us denote the union of (Sfc, . . . , Si) 
by so that IFI'. We have that: 

1. The (true) sum of items in /' is well approximated by the sum of noisy 

counts computed by the algorithm. In particular, with all but {(3/2) prob- 
ability, the error in computing this sum (a sum of logn Laplacian RVs) is 

Q( (log 2 n)-log(l/ (3) ^ 
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2. The difference between the (true) counts in I' and in I is at most the sum of 
counts in Sk and Si. This is because the only items that are in I' but not in 
I are those in Sk or Si (recall that I contains the union of (5/ c +i, . . . , Si- 1 )). 
By the accuracy of the partition algorithm, with all but (/?/ 2) probability, 
this difference is at most 0( lQ g£>+iog(V/3) ^ 

By a union bound, we conclude that with all but (3 probability, the total 
error in computing the count on interval / is Q^ 1 °s i:) +(( lQ s^ n )' lQ s( 1 // 3 )) y 

3.3 Rectangle Queries 

To answer c-dimensional rectangle queries on a database x C [1,D] C , we run 
the partition algorithm on each “axis” of the input space separately. For each 
dimension a E [1, c], we partition the line [1, D] into (at most) n segments, where 
for each of these segments, the number of input elements whose a-th coordinate 
falls into that segment is bounded. I.e., we compute a privacy-preserving parti- 
tion (Sf , . . . , S'"), where for all i, the number of database elements whose a-th 
coordinate falls into Sf is bounded. For the remainder of the construction, we 
will consider the partition of the multi-dimensional space [1, D] c into a collection 
of rectangles: 

Kx-xSjJk 

By the properties of the partition algorithm, these rectangles are disjoint and 
cover the input space. 

Multi- Dimensional Tree. We construct a “multi-dimensional tree” of counts over 
the above partition. The construction is iterative, proceeding one dimension at 
a time from 1 to c: 

- The dimension-1 tree is a binary tree, whose leaves are the segments {5^}ze[i,n] 
(as in the intervals algorithm). Each node of this dimension-1 tree corresponds 
to an interval T 1 , a union of some number (a power of 2) of segments {Sj} 
from the dimension- 1 partition. Each such node contains a noisy count for the 
number of items whose first coordinate falls in the interval T 1 . The node also 
contains a dimension-2 tree, which we call its “successor” . 

- For a E [2, c] each dimension-a tree is a binary tree whose leaves are the 

segments {Sf The dimension-a tree has a “predecessor”, a dimension- 

(a — 1) tree, corresponding to intervals (T 1 , . . . , T a_1 ) in the first (a — 1) 
dimensions. 

Each node in the dimension-a tree corresponds to an interval T a , a union 
of some number (a power of 2) of segments {Sf } from the dimension-a par- 
tition. Each such node contains a noisy count for the number of items s.t. 
for all i E [1, a] , their i-th coordinate falls in T l . For a < c (until the “final” 
dimension), each node also contains a dimensional- (a + 1) tree, which we call 
its “successor”. 
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For privacy parameter 6, the noise added to each count is drawn from a 
Laplace random variable with magnitude (41og c n/£). We view each node in the 
above construction as specifying a c-dimensional rectangle T = (T 1 x . . . x T c ) 
(for nodes in dimension-a trees where a < c, the intervals (T a+1 , . . . , T c ) are 
“full” and equal [1 ,£>]). Each such node contains a noisy count of the number of 
input elements that fall into this rectangle, i.e. of \xC\T\. The size of this data 
structure is roughly n c . By pruning this tree, removing nodes with small noisy 
counts (and their successors), we can obtain a data structure of size 0{n • log c n) 
(whose construction also requires time 0(n • log c n)), see Remark 3.4 below. 

The following two claims will be used in arguing privacy and accuracy: 

Claim. Adding or removing an element to the dataset only changes the counts 
of at most (2 log c n) nodes in the multi-dimensional tree. 

Proof. Let Xj £ [1, d] c be a data item. Let (S ^ , . . . , S™) be the (unique) segments 
of the partition s.t. the a-th coordinate of Xj is in Sf a (for all a £ [1 , c] ) . We 
bound the number of nodes in the tree for which their corresponding rectangle T 
includes Xj (adding or removing Xj will not affect the counts in any other nodes). 
In the dimension- 1 tree there are only logn such nodes: the leaf corresponding 
to the segment fid , and its ancestors in the tree. Now observe that for the other 
nodes in the dimension- 1 tree, their successors (and their successors) will never 
correspond to rectangles that include Xj. For the logn nodes that do include 
Xj, their successors are dimension-2 trees, and they each have logn nodes that 
include Xj. Thus, we have log 2 n nodes in dimension-2 trees that include Xj. For 
all other nodes, their successors will not include Xj. Continuing as above, we 
have that in the dimension-a trees there are log a n nodes that include Xj . We 
conclude that in total, the number of nodes in the multi-dimensional tree that 
include Xj is bounded by: 

c 

^ log a n < 2 log c n 

a=l 

Claim. For any rectangle R = ( R 1 x . . . x R c ) C [1,D] C , there exists a tight 
“covering” of that rectangle using a set of at most m = (21ogn) c nodes T = 
{Ti, . . . , T m } from the multi-dimensional tree. Taking Q = (J^ Ti we have: 

1. R is no larger than Q, in particular R C Q. 

2. Q is not “much” larger than R. In particular, for each dimension a there 
exist segments Sf , S% (segments of the a-th partition) s.t. for any element in 
y £ (Q\R) for some a £ [1, c] the a-th coordinate of y is in either Sf or S% 
(and thus, by the properties of the partition algorithm, the size of ( Q \ R) is 
not too large). 

Proof. Similarly to the intervals algorithm, we begin with a separate “cover” 
for the intervals that constitute each dimension of the rectangle R. As in the 
intervals algorithm, for each dimension a £ [l,c], there exists a collection T a of 
2 log n intervals corresponding to nodes in the dimension-a tree that “cover” the 
interval R a as follows. Taking Q a = {J TeTa T: 
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1. R a C Q a . 

2. There exist two segments (S'?, S%) of the a-th partition, s.t. (Q a \(Sf (J S?)) C 
R° 

Now the claim follows by taking T, the set of tree nodes, to be T = (T 1 x 
. . . x T c ). This is a set of at most (2 logn) c nodes, as required. Moreover, taking: 

Q = U T = (Q 1 x...xQ c ), 

TeT 

by the above properties of the cover on each dimension separately, we get that: 
R = (R 1 x . . . x R c ) C (Q 1 x . . . x Q c ) = ( (J T) = Q. 

Ter 

Moreover, for each dimension a we denote Q' a = (Q a \ (S? (J S%)). We have that 
Q' = Qi x . . . x Q' c has the properties that Q' C 77, and for every element 
y G (Q \ Q'), for some a G [1, c], its a-th coordinate is in (S? (J S£). 

Answering Rectangle Queries. We use the multi-dimensional tree of noisy counts 
described above to answer rectangle queries. Given a rectangle query R = (R 1 x 
. . . x R c ) C [1,.D] C , we decompose it into a “cover” T of (21ogn) c tree nodes as 
promised in Claim 3.3. We answer the query R by adding up the noisy counts 
for the these m nodes and outputting this noisy sum. This can be done in time 
poly(log c n). 

Theorem 3.3 (Theorem 1.3, Restated). The mechanism described above 
answers c- dimensional rectangle queries. For any privacy and accuracy para- 
meters £, (3 > 0, the mechanism guarantees e- differential privacy. With all but 
(3 probability over its coins, all rectangle queries (simultaneously) are answered 

with error 0( (c2 - logD)+(c - (21o y )1 ' 5c+1 - los(1//3)) ). 

Proof. By composition of DP mechanisms, privacy follows directly from: (i) 
privacy of the Partition algorithm (for computing the c partitions), and (ii) 
from Claim 3.3 and the fact that we add Laplace noise of magnitude (41og c n/e) 
to each count. 

For accuracy, observe that after we partition the axis, there are n 2c possible 
rectangle queries (rectangles whose covers are identical are essentially equiva- 
lent). For each such query R, we release a noisy count for its cover T. The noise 
is a sum of (at most) (21ogn) c independent Laplace RVs, each of magnitude 
21og c n. With all but (/?/2) probability, the maximal noise added to the count 

of any of these covers is of magnitude at most O( c ^ 21ogn ) — g + ' log ^ 1 / /3 ^ ) ) ( se e the 
analysis for the sum of Laplacian RVs in [CSS11]). So for each rectangle R with 
cover T, the error in the noisy count for T is bounded. 

We run the partition algorithm c times, each with privacy parameter (e/2 c) 
and error parameter ((3/2 c). With all but ((3 /2c) probability, the size of each 
segment in each of the c partitions is at most 0( c '^° g £> + 1 °g c + lo g( 1 /^)) j i Every 


750 


C. Dwork et al. 


point that is in T but not in R must have one of its coordinates be in a (fixed) 
set of 2c such segments. Thus, by the second property of the cover T (see 
Claim 3.3), the difference between the true counts of R and of T is at most 
0( c -(iogD+iogc+iog(i//3 )) y T'he error bound follows by a triangle inequality (and 
a union bound). 

Remark 3.4 . The naive construction of the multi-dimensional tree requires time 
(and size) n c . We improve this running time dramatically by judiciously “prun- 
ing” the tree. We take a threshold t = 0((logn) c+1 • log(l //?)), and as we con- 
struct the multi-dimensional tree (starting with the dimension- 1 tree), for any 
node whose noisy count is smaller than £, we set that node to be “empty” (noisy 
count 0), and do not continue to its children in the current tree, nor to its suc- 
cessor. By this choice of £, w.h.p. over the noise, any node that is not marked as 
empty corresponds to a rectangle that is not empty in the input database. 

Now when using the noisy counts to reconstruct the answers to a given rectan- 
gle, because we might be under-counting for all (2 log n) c of the nodes that we use 
to “cover” the query, we obtain a slightly-larger error of ((log n)°^ • log(l //?)). 
The advantage, however, is that the running time and the size of the multi- 
dimensional tree are improved to 0(n • log c n). To see this, recall that any node 
that is not marked as “empty” must have at least 1 data item in its correspond- 
ing rectangle. The bound on the tree size follows by induction over c (as does 
the improved running time). 
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Abstract. Multilinear maps have become popular tools for designing 
cryptographic schemes since a first approximate realisation candidate 
was proposed by Garg, Gentry and Halevi (GGH). This construction was 
later improved by Langlois, Stehle and Steinfeld who proposed GGHLite 
which offers smaller parameter sizes. In this work, we provide the first 
implementation of such approximate multilinear maps based on ideal lat- 
tices. Implementing GGH-like schemes naively would not allow instan- 
tiating it for non-trivial parameter sizes. We hence propose a strategy 
which reduces parameter sizes further and several technical improve- 
ments to allow for an efficient implementation. In particular, since find- 
ing a prime ideal when generating instances is an expensive operation, 
we show how we can drop this requirement. We also propose algorithms 
and implementations for sampling from discrete Gaussians, for inverting 
in some Cyclotomic number fields and for computing norms of ideals in 
some Cyclotomic number rings. Due to our improvements we were able 
to compute a multilinear jigsaw puzzle for k = 52 (resp. k = 38) and 
A = 52 (resp. A = 80). 


Keywords: Algorithms • Implementation • Lattice-based cryptogra- 
phy • Cryptographic multilinear maps 


1 Introduction 

Multilinear maps, starting with bilinear ones, are popular tools for designing 
cryptosystems. When pairings were introduced to cryptography [Jou04], many 
previously unreachable cryptographic primitives, such as identity-based encryp- 
tion [BF03], became possible to construct. Maps of higher degree of linearity 
were conjectured to be hard to find - at least in the “realm of algebraic geom- 
etry” [BS03]. But in 2013, Garg, Gentry and Halevi [GGH13a] proposed a con- 
struction, relying on ideal lattices, of a so-called “graded encoding scheme” that 
approximates the concept of a cryptographic multilinear map. 
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As expected, graded encoding schemes quickly found many applications in 
cryptography. Already in [GGH13a] the authors showed how to generalise the 
3-partite Diffie-Hellman key exchange first constructed with cryptographic bilin- 
ear maps [BS03] to N parties: the protocol allows N users to share a secret key 
with only one broadcast message each. Furthermore, a graded encoding scheme 
also allows constructing very efficient broadcast encryption [BS03,BWZ14]: a 
broadcaster can encrypt a message and send it to a group where only a part of 
it (decided by the broadcaster before encrypting) will be able to read it. More- 
over, [GGH+13b] introduced indistinguishability obfuscation (iO) and functional 
encryption based on a variant of multilinear maps — multilinear jigsaw puz- 
zles — and some additional assumptions. 

The GGH Scheme. For a multilinearity parameter ft, the principle of the sym- 
metric GGH graded encoding scheme is as follows: given a ring R and a principal 
ideal T generated by a small secret element g £ i7, a plaintext is a small ele- 
ment of R/T and is viewed as a level-0 encoding. Given a level-0 encoding, it is 
easy increase the level to a higher level i < ft, but it is assumed hard to come 
back to an inferior level. The encodings are additively homomorphic at the same 
level, and multiplicatively homomorphic up to ft operations. The multiplication 
of a level- i and a level - j encoding gives a level- (i + j ) encoding. Additionally, 
a zero-testing parameter p zt allows testing if a level-ft element is an encoding 
of 0, and hence also allows testing if two level-ft encodings are encoding the same 
elements. Finally, the extraction procedure uses p zt to extract £ bits which are 
a “canonical” representation of a ring element given its level-ft encoding. 

More precisely, in GGH we are given R = 7L[X]/[X n + 1), where n is a 
power of 2, a secret element z uniformly sampled in R q = R/qR (for a certain 
prime number g), and a public element y which is a level- 1 encoding of 1 of the 
form [a/z] for some small a in the coset 1 + 1. We are also given m level- i 

encodings of 0 named x^\ for all 1 ^ i ^ ft, and a zero-testing parameter p zt . 
To encode an element of R/T at level- i (for i ^ ft), we multiply it by y l in R q 
(which give an element of the form [c/A] , where c is an arbitrary small coset 
representative). Then, we add a linear combination of encodings of 0 at level- i 
of the form JA PjX^ to it where the pj are sampled from a certain discrete 
Gaussian. This last step is the re-randomisation process and ought to ensure 
that the analogue of the discrete logarithm problem is hard: going from level-i 
to level-0, for example by multiplying the encoding by y ~ l . We will see later that 
the encodings of zero made public for this step are a problem for the security of 
the scheme. 

The asymmetric variant of this scheme replaces levels by “groups” which are 
identified with subsets of {1, . . . , ft}. Addition of two elements in the same group 
stays within the group, multiplying two elements of different groups with disjoint 
index sets produces an element in the group defined by the union of their index 
sets. These groups are realised by defining one Zi for each index 1 i ^ ft and 
then dividing by the appropriate product of Zi. Given a group characterised by 
S C {1, . . . , ft} we call the cardinality of S its level. 
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We can distinguish between GGH instances where encodings of zero are made 
publicly available to allow anyone to encode elements and those where this is 
not the case. The latter are also called “Multilinear Jigsaw Puzzles” and were 
introduced in [GGH+13b] as a building block for indistinguishability obfusca- 
tion. Such instances can be thought of as secret-key graded encoding schemes. 
To distinguish the two cases, we denote those instances where no encodings of 
zero are published as GGH S . In such instances the secret elements g and Zi 
are required to encode elements at levels above zero. 

Security. Already in [GGH 13a] it was shown that an attacker can recover the 
ideal (g) and the coset of ( g ) for any encoding at level ^ n if encodings of 
zero are made available. However, since these representatives of either ( g ) or the 
cosets are not small, it was believed that these “weak discrete log” attacks would 
not undermine the central security goal of GGH - the analogue of the BDDH 
assumption. However, in [HJ15] it was shown that these attacks can be extended 
to recover short representatives of the cosets. As a consequence, if encodings of 
zero are published, then [HJ15] breaks the GGH security goals in many scenarios 
and it is not clear, at present, if and how GGH-like graded encoding schemes can 
be defended against such attacks. A candidate proposal to prevent weak discrete 
logarithm attacks was proposed in [CLT15, Appendix G], where the strategy is 
to change zero testing to make it non-linear in the encodings such that the attack 
does not work anymore. However, no security analyses was provided in [CLT15] 
and revision 20150516:083005 of [CLT15] drops any mention of this candidate 
fix. Hence, the status of GGH-like schemes where encodings of zero are published 
is currently unclear. However, we note that GGH S , where no encodings of zero 
are made available, does not appear to be vulnerable to weak discrete log attacks 
if the freedom of an attacker to produce encodings of zero at the higher levels 
is also severely restricted to prevent generalisations of “zeroizing” attacks such 
as [CGH+15]. Such variants are the central building block of indistinguishability 
obfuscation, i.e. this case has important applications despite being more limited 
in functionality. Indeed, at present no known attack threatens the security of 
indistinguishability obfuscation constructed from graded encoding schemes such 
as GGH. 

Alternative Constructions. An alternative instantiation of graded encoding 
schemes over the integers promising practicality was proposed by Coron, Lep- 
oint and Tibouchi [CLT13]. This first proposal was also broken in polynomial 
time using public encodings of zero in [CHL+15]. The attack was later gener- 
alised in [CGH+15] and a candidate defence against these attacks was proposed 
in [CLT15]. The authors of [CLT15] also provided a C++ implementation of a 
heuristic variant of this scheme. They report that the Setup phase of an 7-partite 
Diffie- Heilman key exchange takes 4528s (parallelised on 16 cores), publishing a 
share (Publish) takes 7.8 s per party (single core) and the final key derivation 
(KeyGen) takes 23.9 s per party (single core) for a level of security A = 80. 
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Instantiation. The implementation reported in [CLT15] is to date the only imple- 
mentation of a candidate graded encoding scheme. This is partly because instan- 
tiating the original GGH construction is too costly in practice for anything but 
toy instances. In 2014, Langlois, Stehle and Steinfeld [LSS14a] proposed a vari- 
ant of GGH called GGHLite, improving the re-randomisation process of the 
original scheme. It reduces the number m of re-randomisers, public encodings of 
zero, needed from fl(nlogn) to 2 and also the size of the parameter of the 
Gaussian used to sample multipliers pj during the re-randomisation phase from 
(D( 2 a A n 4,5 ft) to 0(n 5 - 5 ^)- These improvements allow reducing the size of the 
public parameters and improving the overall efficiency of the scheme. But even 
though [LSS14a] made a step forward towards efficiency and in some cases no 
public re-randomisation is required at all (GGH S ), GGH-like schemes are still 
far from being practical. 

Our contribution. Our main contribution is a first and efficient implementation 
of improved GGH-like schemes which we make publicly available under an open- 
source license. This implementation covers symmetric and asymmetric flavours 
and we allow encodings of zero to be published or not. However, since the security 
of GGH-like constructions is unclear when encodings of zero are published, we do 
not discuss this variant in this paper. We note, however, that our implementation 
provides a good basis for implementing any future fixes and improvements for 
GGH-based graded encoding schemes. 

Implementing GGH-like schemes efficiently such that non-trivial levels of 
multilinearity and security can be achieved is not straight forward and to obtain 
an implementation we had to address several issues. In particular, we contribute 
the following improvements to make GGH-like multilinear maps instantiable: 

• We show that we do not require (g) to be a prime ideal for the existing 
proofs to go through. Indeed, sampling an element g E Z[X\/(X n + 1) such 
that the ideal it generates is prime, as required by GGH and GGHLite, is a 
prohibitively expensive operation. Avoiding this check is then a key step to 
allow us to go beyond toy instances. 

• We give a strategy to choose practical parameters for the scheme and extend 
the analysis of [LSS14a] to ensure the correctness of all the procedures of the 
scheme. Our refined analysis reduces the bitsize of q by a factor of about 4, 
which in turn reduces the required dimension n. 

• We apply the analyses from [CS97] to pick parameters to defend against lattice 
attacks. 

• For all steps during the instance generation we provide implementations and 
algorithms which work in quasi-linear time and efficiently in practice. In par- 
ticular, we provide algorithms and implementations for inverting in some 
Cyclotomic number fields, for computing norms of ideals in some Cyclotomic 
number rings, for producing short representatives of elements modulo (g) and 
for sampling from discrete Gaussians in (D(n). For the latter we use Ducas 
and Nguyen’s strategy [Ducl3] Our implementation of these operations might 
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Table 1 . Computing a K-level asymmetric multilinear maps with our implementation 
without encodings of zero. Column A gives the minimum security level we accepted, 
column X' gives the actually expected security level based on the best known attacks for 
the given parameter sizes. Timings produced on Intel Xeon CPU E5-2667 v2 3.30 GHz 
with 256 GB of RAM, parallelised on 16 cores, but not all operations took full advantage 
of all cores. Setup gives the time for generating the GGH instance. Encode lists the time 
it takes to reduce an element G with p = N{T) to a small element in Z[X]/ ( X n + 1) 
modulo ( g ). Mult lists the time to multiply n elements. All times are wall times. 


A 

K 

A' 

n 

log q 

Setup 

Encode 

Mul 

l|enc|| 

52 

6 

64.4 

2 15 

2117 

114s 

26 s 

0.05 s 

8.3 MB 

52 

9 

53.5 

2 15 

3086 

133 s 

25 s 

0.12s 

12.1MB 

52 

14 

56.6 

2 16 

4966 

634 s 

84 s 

0.62 s 

38.8 MB 

52 

19 

56.6 

2 16 

6675 

762 s 

75 s 

1.38s 

52.2 MB 

52 

25 

59.6 

2 17 

9196 

2781s 

243 s 

5.78 s 

143.7 MB 

52 

52 

62.7 

2 18 

19898 

26695s 

1016s 

84.1s 

621.8 MB 

80 

6 

155.2 

2 16 

2289 

415s 

74 s 

0.13s 

17.9 MB 

80 

9 

86.7 

2 16 

3314 

445 s 

72 s 

0.27s 

25.9 MB 

80 

14 

120.9 

2 17 

5288 

1525 s 

252 s 

1.38s 

82.6 MB 

80 

19 

80.4 

2 17 

7089 

1821s 

268 s 

3.07s 

110.8MB 

80 

25 

138.8 

2 18 

9721 

9595 s 

967 s 

13.52 s 

303.8 MB 

80 

38 

80.3 

2 18 

14649 

20381s 

947 s 

16.21s 

457.8 MB 


be of independent interest (cf. [LP15] for recent work on efficient sampling 
from a discrete Gaussian distribution), which is why they are available as a 
separate module in our code. 

• We discuss our implementation and report on experimental results. 

Our results (cf. Table 1) are promising, as we manage to compute up to mul- 
tilinearity level k = 52 (resp. k = 38) at security level k = 52 (resp. A = 80) in 
the asymmetric GGH S case. We note that much smaller levels of multilinearity 
have been used to realise non-trivial functionality in the literature. For exam- 
ple, [BLR+15] reports on comparisons between 16-bit encrypted values using a 
9-linear map (however, this result holds in a generic multilinear map model). We 
note that the results in Table 1, where no encodings of zero are made available, 
are not directly comparable with those reported in [CLT15]. 

Technical Overview. Our implementation relies on FLINT [HJP14]. However, 
we provide our own specialised implementations for operations in the ring of 
integers of Cyclotomic number fields where the degree is a power of two and 
related rings as listed above. 

Our variant of GGH foregoes checking if g generates a prime ideal. Dur- 
ing instance generation [GGH13a,LSS14a] specify to sample g such that (g) is a 
prime ideal. This condition is needed in [GGH13a,LSS14a] to ensure that no non- 
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zero encoding passes the zero-testing test and to argue that the non-interactive 
TV-partite key exchange produces a shared key with sufficient entropy. We show 
that for both arguments we can drop the requirement that g generates a prime ideal. 
This was already mentioned as a potential improvement in [Garl3, Section 6.3] but 
not shown there. As rejection sampling until a prime ideal (g) is found is pro- 
hibitively expensive due to the low density of prime ideals in Z[X\/(X n + 1), 
this allows speeding-up instance generation such that non-trivial instances are 
possible. We also provide fast algorithms and implementations for checking if 
(g) C Z[X]/(X n + 1) is prime for applications which still require prime (g). 

We also improve the size of the two parameters q and £ compared to [LSS14a]. 
We first perform a finer analysis than [LSS14a], which allows us to reduce the size 
of the parameter q by a factor 2. Then, we introduce a new parameter £, which 
controls what fraction of q is considered “small”, i.e. passes the zero-testing test, 
which reduces the size of q further. This also reduces the number of bits extracted 
from each coefficient i. Indeed, instead of setting t = 1/4 log q — A where A is the 
security parameter, we set i = £ logg — A with 0 < £ ^ 1/4. We then show that 
for a good choice of £ this is enough to ensure the correctness of the extraction 
procedure and the security of the scheme. Overall, our refined analysis allows 

us to reduce the size of q « (3 nia^a') in [LSS14a] to q « (3ttJ erf + ^ 

which, in turn, allows reducing the dimension n. When no encodings of zero are 
published we simply set = 1 and apply the same analysis. 

Open Problems. The most pressing question at this point is whether GGH-like 
constructions are secure. There exist no security proofs for any variant and recent 
cryptanalysis results recommend caution. Even speculating that secure variants 
of GGH-like multilinear maps can be found, performance is still an issue. While 
we manage to compute approximate multilinear maps for relatively high levels of 
k in this work, all known schemes are still at least quadratic in k which presents 
a major obstacle to efficiency. Any improvement which would reduce this to 
something linear in k would mean a significant step forward. Finally, establishing 
better estimates for lattice reduction and tuning the parameter choices of our 
schemes are areas of future work. 

Roadmap. We give some preliminaries in Sect. 2. In Sect. 3 we describe the 
GGH-like asymmetric graded encoding schemes and the multilinear jigsaw puz- 
zles used for iO. In Sect. 4, we explain our modifications to GGH-like schemes, 
especially concerning the parameter q. We also recall a lattice attack to derive 
the parameter n and show that we do not require (g) to be prime. In Sect. 5, we 
give the details of our implementation. 

2 Preliminaries 

Lattices and Ideal Lattices. An m-dimensional lattice L is an additive subgroup 
of M m . A lattice L can be described by its basis B = {&i, • • • ? &&}? with 

bi G M m , consisting in k linearly independent vectors, for some k < m, called 
the rank of the lattice. If k = m, we say that the lattice has full-rank. The 
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lattice L spanned by B is given by L = {J2i = 1 c i'bi, Ci G Z}. The volume of the 
lattice L, denoted by vol(L), is the volume of the parallelepiped defined by its 
basis vectors. We have vol(L) = ^/det (B T B), where B is any basis of L. 

For n a power of two, let f(X) G 7L[X] be a monic polynomial of degree 

n (in our case, f(X) = X n + 1). Then, the polynomial ring R = Z[X\/f(X) 

is isomorphic to the integer lattice Z n , i.e. we can identify an element u(X) = 
Er=o u i'X l G R with its corresponding coefficient vector (iz 0 , u\, . . . , u n -\). We 
also define R q = R/qR = Z q [X\/(X n + 1) (isomorphic to Z™) for a large prime 
q and K = Q[X]/(X n + 1) (isomorphic to Q n ). 

Given an element g G R, we denote by X the principal ideal in R generated 

by 9- (g) = {g • u : u G R } . The ideal (g) is also called an ideal lattice and 

can be represented by its Z-basis (g, X • g , . . . , X n_1 • g). We denote by A f(g) its 
norm. For any y G R, let [y\ g be the reduction of y modulo X. That is, [y\ g is 
the unique element in R such that y — | y\ g G (g) and [y\ g = Vi^g, with 

yi G [—1/2, 1/2), Vi, 0 ^ i ^ n — 1. Following [LSS14a] we abuse notation and let 
cr n (b) denotes the last singular value of the matrix rot (b) G Z nxn , for any b G X. 
For z e R, we denote by MSB^ G {0, l}^' n the i most significant bits of each of 
the n coefficients of 2 in R. 

Gaussian Distributions. For a vector c G M n and a positive parameter a G 
M, we define the Gaussian distribution of centre c and width parameter a as 
Pa,c(x) = exp(— n ^ x ~ 2 ^ ), for all x G M n . This notion can be extended to ellip- 
soid Gaussian distribution by replacing the parameter a with the square root 
of the covariance matrix X = BB l G M nxn with det(F>) 7^ 0. We define it 
by p ^ c (x) = exp(— 7r • (x — c) t (B t B)~ 1 (x — c)), for all x G M n . For L a 
subset of Z n , let p a , c (L) = ^2 xeL pa,c{ x )- Then, the discrete Gaussian distri- 
bution over L with centre c and standard deviation a (resp. \f~E) is defined as 
D L,cr,c(y) = p a,C (L) 1 f° r V £ L. We use the notations p a (resp. p and Dl^ 
(resp. D l when c is 0. 

Finally, for a fixed Y = (271,2/2) € R 2 , we define: S Y , s = V\ Dr, s + y2D R , s 
as the distribution induced by sampling u = (^1,^2) €= R 2 from a discrete 
spherical Gaussian with parameter s, and outputting y = 2/1^1 +2/2^2- It is shown 
in [LSS14a, Theorem 5.1] that if Y -R 2 =X and s > max(||g _1 ?/i ||oo, \\g~ 1 y 2 ||oo) • 
n • y/2 log(2n(l + 1/s))/tt for 5 G (0, 1/2), this distribution is statistically close 
to the Gaussian distribution D x sY t . 

3 GGH-like Asymmetric Graded Encoding Scheme 

We now recall the definitions given in [GGH+13b, Section 2.2] for the notions of 
Jigsaw specifier, Multilinear Form and Multilinear Jigsaw puzzle. 

Definition 1 ([GGH+13b, Definition 5 ]). A Jigsaw specifier is a tuple (ft, £, A) 
where ft, i G Z + are parameters and A is a probabilistic circuit with the following 
behavior: On input a prime number q, A outputs the prime q and an ordered set 
of £ pairs (Si, ai), . . . , (S^, a#) where each cq G 7L q and each Si C [ft]. 


Implementing Candidate Graded Encoding Schemes from Ideal Lattices 759 


Definition 2 ([GGH+13b, Definition 6 and 7]). A Multilinear Form is a 

tuple T = (/€, 7, 77, F) where k, 7 G Z + are parameters and 77 is a circuit with 7 
input wires , made out of binary and unary gates. F is an assignment of an index 
set I C [k] to every wire of 77 . A multilinear form must satisfies constraints given 
in the original definition (on gates, and the output wire is assigned to [k]). 

We say that a Multilinear Form T = (V, 7', 77, F) is compatible with X = 
((Si, a \), . . . , (St, at)) if k = n' , 7 = 7' and the input wires of FI are assigned to 
the sets ... ,S^. The evaluation of T on X is then doing arithmetic operations 
on the inputs depending on the gates. We say that the evaluation succeeds if the 
final output is ([ft],0). 

We now define the Multilinear Jigsaw Puzzles. 

Jigsaw Generator: JGen(A, ft, 7, A) — ► (g, X, puzzle). This algorithm takes 
as input A, and a Jigsaw specifier (k,£,A). It outputs a prime q , a private 
output X and a public output puzzle. The generator is using a pair of PPT 
algorithms JGen = (InstGen, Encode). 

lnstGen(A,ft) — > (g, params, s). This algorithm takes A and n as inputs and 
outputs (g, params, s), where q is a prime of size at least 2 A , params is a 
description of public parameters, and s is a secret state to pass to the 
encoding algorithm. 

Encode(g, params, s , (5, a)) — > (S', u). The encoding algorithm takes as inputs 
the prime g, the parameters params, the secret state 8, and a pair (S, a) 
with S C [k] and a G 7L q and outputs u , an encoding of a relative to S. 

More precisely, the algorithm runs the Jigsaw specifier on input q to get 7 
pairs (Si, ai), . . . , (S^, ai). Then encodes all the plaintext elements by using 
the Encode algorithm on each (Si^af) which return ( Si,Ui ). We have: 


X = (g, (Si, ai), . . . , and puzzle = (params, (S' 1 ,ixi), . . . , (St,ui)). 

Jigsaw Verifier: JVer(puzzle, T) — > {0,1}. This algorithm takes as input the 
public output of a Jigsaw Generator puzzle, and a multilinear form T . It 
outputs either accept (1) or reject (0). 

Correctness. For an output (g, X, puzzle) and a form T compatible with X, we 
say that the verifier JVer is correct if either the evaluation of T on X succeeds 
and JVer(puzzle, F) = 1 or the evaluation fails and JVer(puzzle, F) = 0. We 
require that with high probability over the randomness of the generator, the 
verifier will be correct on all forms. 

Security. The hardness assumptions for the Multilinear J igsaw puzzle requires that 
for two different polynomial-size families of Jigsaw Specifier £\, A\)} XeZ + 
and |(^ a , 7 a , ^a)}agz+ public output of the Jigsaw Generator on (ft A , 7 A , 71 a) 
will be computationally indistinguishable from the public output of the Jigsaw 
Generator on (k\,£\,A' x ). 
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3.1 Using GGH to Construct Jigsaw Puzzles 

In Fig. 1, we describe a GGH-like asymmetric graded encoding scheme without 
encodings of zero based on the definition of GGHLite from [LSS14a]. Below, we 
explain how to use those procedures to construct the Jigsaw Generator, described 
in [GGH+13b, Appendix A]. 


• Instance generation. lnstGen(l A , U): Given security parameter A and multilinear- 
ity parameter r, determine scheme parameters n, q , a, a', £ g -i, £b, £ as in [LSS14a]. 
Then proceed as follows: 

• Sample g Dr j(T until ||g _1 || ^ £ g -i and X = (g) is a prime ideal. Define 
encoding domain R g = R/ (g). 

• Sample z% U (R q ) for all 0 < i ^ r. 

• Sample h Dr^ and define the zero-testing parameter p z t = ^ Yli= i • 

• Return public parameters params = (n,q,£) and p z t. 

• Encode at level-0 EncO(params, g, e): Compute a small representative e! = [e]^ 
and sample an element e" D e / + x :CT '- Output e" . 

• Encode in group {z} . Enc(params, Zi, e): Given parameters params, Zi, and a 
level-0 encoding e E R, output [e/zi] . 

• Adding encodings. Add(params, ui, U 2 ): Given encodings u\ = [ci/ (Elies 2?i )] q 

and u 2 = [c 2 / ([lies z i)] q with ^£{1 4 

• Return u = [ui + U 2 ] q , an encoding of [ci + C 2 \ q in the group S. 

• Multiplying encodings. Mult(params, ui, U 2 ): Let Si c= [/y], S 2 c= [ft] with 
Si n S 2 = 0, given an encoding ui = ci / ( ) an d an enc °ding 



• Return u = [ui • U2\ q , an encoding of [ci • C2] q in Si u *ST 
• Zero testing at level ft. isZero(params ,p z t,u): Given parameters params, a zero- 
testing parameter p zt , and an encoding u = [c/ (YliZo Zi )~\ q the g rou P M? 
return 1 if ||[p^tit] ||oo < q 3/ ^ and 0 else. 


Fig. 1 . GGH-like asymmetric graded encoding scheme adapted from [LSS14a]. 

Jigsaw Generator. The Jigsaw Generator uses InstGen to generate all the 
public (params and p zt ) and secret parameters of the multilinear map. Each 
level of the multilinear map will be associated with a subset of the set [k]. 
To create the puzzle pieces, which are encodings of some elements of R at 
different level, the Generator simply encodes some random elements at level 
S C [1, /c], those are given as puzzle. 

Jigsaw Verifier. The verifier is given the public parameters params and p zt , 
a valid form 77 (which is defined [GGH+13b, Def. 6] in as a circuit made 
of binary and unary gates) and puzzle, an input for 77 (which are some 
encodings). The verifier is then evaluating 77 on these input using Add for 
addition gates and Mult for multiplication gates. The verifier must succeeds 
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if the evaluation of T on X succeeds, which means that the final output of 
the evaluation is an encoding of zero at level n. The verifier is invoking the 
zero-testing procedure, and outputs 1 if the test passes, 0 otherwise. 

4 Modifications to and Parameters for GGH-like Schemes 

In this section, we first show that we do not require a prime (g) and then describe 
a method which allows to reduce the size of two parameters: the modulus q and 
the number i of extracted bits. In Sect. 4.3 then we describe the lattice-attack 
against the scheme which we use to pick the dimension n. Finally, we describe 
our strategy to choose parameters that satisfy all these constraints. 

4.1 Non-prime ( g ) 

Both GGHLite and GGH-like jigsaw puzzles as specified in Fig. 1 require to 
sample a g such that (g) is a prime ideal. However, finding such a g is pro- 
hibitively expensive. While checking each individual g whether (g) is a prime 
ideal is asymptotically not slower than polynomial multiplication, finding such 
a g requires to run this check often. The probability that an element generates 
a prime ideal is assumed to be roughly 1 /(n c ) for some constant c > 1 [Garl3, 
Conjecture 5.18], so we expect to run this check n c times. Hence, the overall 
complexity is at least quadratic in n which is too expensive for anything but toy 
instances. 

Primality of (g) is used in two proofs. Firstly, to ensure that after multiplying 
k + 1 elements in R g the product contains enough entropy. This is used to argue 
entropy of the TV-partite non-interactive key exchange. Secondly, to prove that 
c • h/g is big if c, h 0 g (cf. Lemma 2). Below, we show that we can relax the 
conditions on g for these two arguments to still go through, which then allows 
us to drop the condition that (g) should be prime. We note, though, that some 
other applications might still require g to be prime and that future attacks might 
find a way to exploit non-prime (g). 

Entropy of the Product. The next lemma shows that excluding prime factors 
< 27V and guaranteeing A f(g) ^ 2 n is sufficient to ensure 2A bits of entropy in 
a product of n + 1 elements in R g with overwhelming probability. We note that 
both conditions hold with high probability, are easy to check and are indeed 
checked in our implementation. 

Lemma 1. Let n ^ 2, A be the security parameter and g G T>[X]/{X n + 1) with 
norm p = A f(g) ^ 2 n such that p has no prime factors ^ 2^ + 2, and such 
that n ^ n • A • log(A). Then, with overwhelming probability, the product of n + 1 
uniformly random elements in R g has at least n • A • log(A)/4 bits of entropy. 

Proof. Write p = n[=i pT where pi are distinct primes and ei > 1 for all i. Let 
us consider the set S = {i G {1, . . . , r} : e* = 1}. Then, following [CDKD14] we 
define p s = Y\ ieS Pi as the square-free part of p. Asymptotically, it holds that 
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#{p < x : p/p s > p s } is cx 3 / 4 for some computable constant c (cf. [CDKD14]). 
Since in our case we have x ^ 2 n , this implies that with overwhelming probability 
it holds that p s ^ yfp and hence log (p s ) ^ n/2. 

By the Chinese Remainder Theorem, R g is isomorphic to F\ x • • • x F r where 
each “slot” Fi = Z e*. The set of for i G S corresponds to the square- free 
part of p. Those Fi are fields, and each of them has order pi > 27V which means 
that a random element in such Fi is zero with probability 1/pi. In those slots, 
the product of TV elements has E s bits of entropy, where 

First, as pi ^ 2 TV for all i G <S, the quotient N/pi <1/2 and then ^1 — ^ ^1/2 
for alii G S. This implies that 

E s > l/2^>gfe) = 1/2 log (JR/ = 1/2 log (p s ). 

Because log (p s ) > n/2, we conclude that E s > \ > /c-A-iog(A) . □ 

Probability of False Positive. It remains to be shown that we can ensure that 
there are no false positives even if ( g ) is not prime. In [GGH13a, Lemma 3] false 
positives are ruled out as follows. Let u = [c/z K ] where c is a short element in 
some coset of X, and let w = \p z t • u] , then we have w = [c • h/g\ . The first step 
in [GGH13a] is to suppose that || g • w jj and \\c • h\\ are each at most q/ 2, then, 
since g • w = c • h mod q we have that g • w = c • h exactly. We also have an 
equality of ideals: (g) • ( w ) = (c) • (Ti), and then several cases are possible. If ( g ) 
is prime as in [GGH13a, Lemma 3], then ( g ) divides either (c) or ( h ) and either 
c or h is in ( g ). As, by construction, none of them is in ( g ) if c is not in Z, either 
\\g • w\\ or || c- h\\ is more than q/ 2. Using this, they conclude that there is no 
small c (not in X) such that w is small enough to be accepted by the zero-test. 

Our approach is to simply notice that all we require is that ( g ) and ( h ) 
are co-prime. Checking if ( g ) and ( h ) are co-prime can be done by checking 
gcd (AT (g) , AT (h)) = 1. However, computing J\f{h) is rather costly because h is 
sampled from and hence has a large norm. To deal with this issue we 

notice that if gcd(W(g), ^ 1 then we also have gcd(A/"(g), Af(h mod g)) ^ 

1 which can be verified with a simple calculation. Now, interpreting h mod g as 
“a small representative of h modulo g ” , we can compute h mod g as h — g • |_g _1 • 
h | , which produces an element of size « y/n- \\g\\. We can use this observation to 
reduce the complexity of checking if (g) and (h) are co-prime to computing two 
norms for elements of size ||g|| and « y/n- ||g|| and taking their gcd. Furthermore, 
this condition holds with high probability, i.e. we only have to perform this test 
0(1) times. Indeed, by ruling out likely common prime factors first, we expect 
to run this test exactly once. Hence, checking co-primality of (g) and (h) is much 
cheaper than finding a prime (g) but still rules out false positives. 
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Finally, we note that recent proposals of indistinguishability obfuscation from 
multilinear maps [Ziml5, AB15] requires composite order maps. These are not 
the maps we are concerned with here as in [Ziml5, AB15] it is assumed that 
the factorisation of (g) is known. However, we note that our techniques and 
implementation easily extend to this case by considering g = g x • g 2 for known 
co-prime g\ and # 2 - 


4.2 Reducing the Size of q 

In this section, we show how to reduce q for which we consider the case where 
re-randomisers are published for level- 1 but no other levels. This matches the 
requirements of the TV-partite Diffie- Heilman key exchange but not the Jigsaw 
puzzle case. However, when no re-randomisers are published we may simply set 
cr* = 1 and apply the same analysis. Hence, assuming that re-randomisers are 
published fits our framework in all cases and makes our analysis compatible with 
previous work. We note that the analysis can be easily generalised to accommo- 
date re-randomisers at higher levels than one by increasing q to accommodate 
“numerator growth”. 

The size of q is driven from both correctness and security considerations. To 
ensure the correctness of the zero-testing procedure, [LSS14a] showed the two 
following lower bounds on q. Equation 1 implies that false negatives do not exist, 
and Eq. 2 implies that the probability of false positive occurrence is negligible: 

q > max ((n£ g -i) 8 , (1) 

q > (2 ncr) 4 . (2) 

The strongest constraint for q is given by the inequality q > (3n icrjfcr') . It 
comes from the fact that for any level- ft encoding u of 0, the inequality ||_p^t^|| oo < 
g 3 / 4 has to hold. The condition is needed for the correctness of zero-testing and 
extraction. 


New parameter £. The choice suggested in [LSS14a] is to extract £ = log(g)/4 — 
A bits from each element of the level-ft encoding. We show that this supplies 
much more entropy than needed and that we can sample a smaller fraction, 
£ = £ log(g) — A bits. The equation for q can be rewritten in terms of the variable 
£, by setting the initial condition ||p^^i||oo < g 1- ^- 


Lemma 2 (Adapted from Lemma A.l in [LSS14b]). Let g G R and 

X = (g), let c,h e R such that c^X, (g) and (h) are co-prime, \\c • h || < g/2 and 
q > (fltna) 1 ^ for some t ^ 1 and any 0 < £ ^ 1/4. Then || [c • h/g\ q || > t • g 1- ^. 
Proof. From [GGH13a, Lemma 3] and the discussion in Sect. 4.1 we know that 
since ||c- h || < g/2 we must have g ■ [c • h/g] > g/2 if (g) and (h) are co- 


prime (note that c-h ^ g-[c- h/g ] in Rj ( X n + 1)). So we have g • [c • h/g] 


> 


g/2: 


ills'll- [c-h/g] > q/2 


[c'h/g] > q/(2na). We have t-g 1 ^ = 


t • qj ap < t • g/ (2 tna) — qj (2 ncr) and the claim follows. 
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Correctness of Zero-Testing. We can obtain a tighter bound on q by refining 
the analysis in [LSS14a]. Recall that \\\p zt u\ q \\oo = \\[hc/ g\ q \\ OQ = || h • c/g\\ ^ < 

INI ' \\ c /d\\ ^ ll^ll ’ ll c ll ’ ||<7 _1 ||\/w. The first inequality is a direct application of 
the inequalities between the infinity norm of a product and the product of the 
Euclidean norms, the second comes from [Garl3, Lemma 5.9]. 

Since h <— we have \\h\\ ^ y/ nq 1 ' 2 . Moreover, c can be written as 

a product of k level-1 encodings iq, for i = l,...,/q i.e., c = Y[* =1 u i- Thus, 
||c|| ^ (y/n) K (maxi = i ||iq||)^ si nce each of the k — 1 multiplications brings 
an extra yfn factor. Let ?i max be one of the iq of largest norm. It can be written 
as w m ax = e • a + p\ • fr/b + p 2 • 62^- As we sampled the polynomial g such that 
||g _1 || ^ the i ne Q u ality \\[p z t ^] g ||oo < Q 1 ~^ holds if: 

nl g -i (\/«) K_1 ||(e • a + pi ■ b ^ + p 2 • < q 1/2 ~ e - (3) 

Then, since 

\\e-a + Pl -b { p +p2-b i 2 1) \\ K (||e|| • \\a\\^n + \\ Pl || • \\b ( ^ ) \\^n+ ||p 2 || • W^WVnf, 

e <— D R y,a <— -Di+i )(T ',&i ,i>2 and P1.P2 <- D R} a *, we can bound 

each of these values as ||e||, |ja||, ||&y ||, H&^ll ^ a ' \Ab ||pi||, 1 1 /02 1 1 ^ to get: 



In [LSS14a], we had £ = 1/4 (which give 2/(1 — 2£) = 4), we hence have that 
this analysis allows to save a factor of 2 in the size of q even for the same £. If 
we additionally consider £ < 1/4 bigger improvements are possible. For practical 
parameter sizes we reduce the size of q by a factor of almost 4 because £ tends 
towards zero as n and A grow. 

Correctness of Extraction. As in [LSS14a], we need that two level-ft encodings 
u and v! of different elements have different extracted elements, which implies 
that we need: || \p z t(u ~ uf )] q \\oo > 2 L- ^ +1 with L = [log^J- This condition 
follows from Lemma 2 with t satisfying t • g 1- ^ > 2 L- ^ +1 , which holds for t = 
q€ . 2~ i+1 . As a consequence, the condition q > (2 tna) 1 ^ is still satisfied if we 
have i > log 2 (8ncr), and to ensure that t > 1 we need that i < £logg + 2. 
Finally, to ensure that s ex t, the probability of the extraction to be the same for 
two different elements, is negligible, we need that i ^ £log 2 q — log 2 (2 n/£ ext ). 

Picking £ and q. Putting all constraints together, we let £ = log(8ncr) and 



To find £ we solve i + A = . i 0 g q f or £ and set q = q i- 2 t . 


Implementing Candidate Graded Encoding Schemes from Ideal Lattices 765 


4.3 Lattice Attacks 

To pick a dimension n we rely on lattice attacks. The most efficient lattice 
attacks described [GGH13a] rely on computing weak discrete logarithms and 
hence do not seem to be applicable to either the case where no encodings of 
zero are published or the case where such attacks are ruled out in some other 
way. However, we may mount the attack from [CS97] against GGH-like graded 
encoding schemes. We explain it in the symmetric setting. Assume two encodings 
of random elements: u\ = [e\ /z\ and u 2 = [^/z] . We have 


~Ui 


~e 1 /z 


"ei‘ 

_ u 2 _ 

q 

. e 2/z_ 

q 

_ e 2_ 


with ei and e 2 small. We set up the lattice A = 



where / is the n x n 


identity matrix, 0 is the nxn zero matrix, and U a rotational basis for [ui/u 2 \ q - 
By construction A contains the vector (ei, e 2 ) which is short. We have det(A) = 
q n and ||(ei,e 2 )|| ~ \/2 na' . In contrast, a random lattice with determinant q n 
and dimension 2 n is expected to have a shortest vector of norm « g n / 2n = ^jq 
which is much longer than ||(ei,e 2 )||. While A does not constitute a Unique- 
SVP instance because there are many short elements of norm roughly y/2 na' 
we may consider all of these “interesting” . Clearly, there is a gap between those 
“interesting” vectors and the expected length of short vectors for random lattices. 
To hedge against potential attacks exploiting this gap, we may hence want to 
ensure that finding those “interesting” short vectors is hard. The hardness of 
Unique-SVP instances is determined by the ratio of the second shortest A 2 (A) 
and the shortest vector Xi(A) of the lattice. We assume that the complexity of 
finding a short element in A depends on the gap between (ei,e 2 ) and yj q in a 
similar way. 

In order to succeed, an attacker needs to solve something akin of a Unique- 
SVP instance with gap \ 2 (A)/\i(A). We need to pick parameters such that 
this problem takes at least 2 A operations. The most efficient technique known 
in the literature to produce short lattice vectors is to run lattice reduction. 
The quality of lattice reduction is typically expressed as the root-Hermite factor 
00 . An algorithm with root-Hermite factor ao is expected to output a vector 
v in a lattice L such that \\v\\ = <Tq vo^L) 1 / 72 . Hence, in our case we require 
r • <TQ n ^ \ 2 (A)/\i(A) and thus 





1/(2 n ) 


(5) 


where r is a constant which depends on the lattice structure and on the reduction 
algorithm used. Typically r ~ 0.3 [APS15], which we will use as an approxima- 
tion. 

Currently, the most efficient algorithm for lattice reduction is a variant of the 
BKZ algorithm [SE94] referred to as BKZ 2.0 [CN11]. However, its running time 
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and behaviour, especially in high dimensions, is not very well understood: there 
is no consensus in the literature as to how to relate a given ao to computational 
cost. We estimate the cost of lattice reduction as in [APS15]. 

We stress, though, that these assumptions requires further scrutiny. Firstly, 
this attack does not use p zt which means we expect that better lattice attacks 
can be found eventually. Secondly, we are assuming that the lattice reduction 
estimates in [APS 15] are accurate. However, should these assumptions be falsi- 
fied, then this part of the analysis can simply be replaced by refined estimates. 


4.4 Putting Everything Together 

Our overall strategy is as follows. Pick an n and compute parameters cr, cr', cr\ as 
in [LSS14a] and £ g and q as in Sect. 4.2. Now, establish the root-Hermite factor 
required to carry out the attack in Sect. 4.3 using Equation (5). If this ao is small 
enough to satisfy security level A terminate, otherwise double n and restart the 
procedure. 

We give choices of parameters in Table 2. 

Table 2. Parameter choices for multilinear jigsaw puzzles. 


A 

K 

n 

Q 

II enc || 

| params 

cro 

BKZ Enum 

BKZ Sieve 

52 

2 

2 14 

^ 2 781-5 

ps 2 23 ' 6 

^ 2 23 - 6 

1.006855 

ps 2 112 ' 2 

^ 2 101-8 

52 

4 

2 15 

w 2 1469 - 0 

« 2 25 ' 5 

w 2 25 - 5 

1.007031 

^ 2 110 - 4 

ps 2 102-3 

52 

6 

2 15 

^ 2 2114 - 9 

« 2 26 ' 0 

^ 2 26.0 

1.010477 

« 2 64-4 

« 2 83 ’ 3 

52 

10 

2 15 

^ 2 3406 - 8 

« 2 26 ' 7 

« 2 26 ' 7 

1.017404 

ps 2 53-5 

ps 2 68-6 

52 

20 

2 16 

^ 2 7014 - 8 

« 2 28 ’ 8 

w 2 28 - 8 

1.018311 

w 2 56 - 6 

« 2 71 ' 7 

52 

40 

2 17 

^ 2 14599 - 3 

« 2 30 ’ 8 

^ 2 30.8 

1.019272 

« 2 59 ' 6 

« 2 74,8 

52 

80 

2 18 

w 2 30508 - 4 

« 2 32 ' 9 

« 2 32 ' 9 

1.020258 

« 2 62 ' 7 

« 2 77 ' 8 

52 

160 

2 18 

^ 2 60827 - 8 

^ 2 33 ’ 9 

^ 2 8 3.9 

1.040912 

PS 2 54 - 0 

« 2 54 0 

80 

2 

2 14 

^ 2 837 - 5 

« 2 23,7 

« 2 23 ' 7 

1.007451 

ps 2 98 ' 2 

« 2 94,5 

80 

4 

2 15 

^ 2 1525 - 0 

« 2 25 ' 6 

^ 2 2 5-6 

1.007330 

ps 2 103-7 

w 2 98 - 8 

80 

6 

2 16 

^ 2 2287 - 2 

ps 2 27 ' 2 

« 2 27,2 

1.005661 

^ 2 160 ' 9 

^ 2 128 ’ 3 

80 

10 

2 17 

w 2 3844 - 7 

ps 2 28 ' 9 

« 2 28 ' 9 

1.004882 

« 2 209-0 

« 2 150-9 

80 

20 

2 18 

w 2 7824 - 9 

w 2 30 - 9 

^ 2 30.9 

1.005074 

« 2 198 ' 9 

w 2 148 - 5 

80 

40 

2 19 

^ 2 16152 - 9 

^ 2 33 ’ 0 

^ 233.0 

1.005294 

^ 2 188 ' 4 

« 2 145,7 

80 

80 

2 2° 

^ 2 33546 - 4 

« 2 35 ' 0 

^ 235.0 

1.005528 

« 2 179 ' 7 

« 2 143 ' 6 

80 

160 

2 21 

w 2 69810 - 9 

ps 2 37 ’ 1 

« 2 37 ' 1 

1.005769 

PS 2 171 - 3 

« 2 141 ' 4 


5 Implementation 

Our implementation relies on FLINT [HJP14]. We use its data types to encode 
elements in Z[X\, Q[X], and Z q [X\ but re-implement most non-trivial opera- 
tions for the ring of integers of a Cyclotomic number field where the degree is a 
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power of two. Other operations — such as Gaussian sampling or taking approx- 
imate inverses — are not readily available in FLINT and are hence provided 
by our implementation. For computation with elements in M we use MPFR’s 
mpfr_t [Thel3] with precision 2 A if not stated otherwise. Our implementation is 
available under the GPLv2+ license at https://bitbucket.org/malb/gghlite-flint. 
We give experimental results for computing multilinear maps using our imple- 
mentation in Table 1. 

For all operations considered in this section naive algorithms are available in 
O ( n 2 log q) or O ( n 3 logn) bit operations. However, the smallest set of parame- 
ters we consider in Table 1 is n = 2 15 which implies that if implemented naively 
each operation would take 2 49 bit operations for the smallest set of parameters 
we consider. Even quadratic algorithms can be prohibitively expensive. Hence, in 
order to be feasible, all algorithms should run in quasi-linear time in n, or more 
precisely in O (n log n) or O (n log 2 nj . All algorithms discussed in this section 
run in quasi-linear time. 

5.1 Polynomial Multiplication in Z q [X]/(X n + 1) 

During the evaluation of a GGH-style graded encoding scheme multiplications of 
polynomials in Z q [X\/ (X n + 1) are performed. Naive multiplication takes O (n 2 ) 
time in n, Asymptotically fast multiplication in this ring can be realised by first 
reducing to multiplication in Z[X] and then to the Sch?nehage-Strassen algo- 
rithm for multiplying large integers in O(nlognloglogn). This is the strategy 
implemented in FLINT, which has a highly optimised implementation of the 
Sch?nehage-Strassen algorithm. Alternatively, we can get an O(nlogn) algo- 
rithm by using the Number- Theoretic Transform (NTT). Furthermore, using a 
negative wrapped convolution we can avoid reductions modulo ( X n + 1): 

Theorem 1 (Adapted from [Win96]). Let uj n be a nth root of unity in 7L q 
and (p 2 = uo n . Let a = a iX l and b = o ^iX 1 G Z q [X\/(X n + 1). Let 

c = a-b G Z q [X\/ ( X n + 1) and let a = (ao, ^ai, . . . , (p n ~ 1 a n - 1 ) and define b and 
c analogously. Then c = 1/n • NTTf ) ^{NTT UJri (a) © NTT (JJn (b)). 

The NTT with a negative wrapped convolution has been used in lattice-based 
cryptography before, e.g. [LMPR08]. We note that if we are doing many opera- 
tions in 7L q [X\/ [X n + 1) we can avoid repeated conversions between coefficient 
and “evaluation” representations, (/(l), /(o; n ), . . . , /(cj^ - 1 )) , of our elements, 
which reduces the amortised cost from O(nlogn) to 0{n). That is, we can con- 
vert encodings to their evaluation representation once on creation and back only 
when running extraction. We implemented this strategy. We observe a consider- 
able overall speed-up with the strategy of avoiding the conversions where possi- 
ble. We also note that operations on elements in their evaluation representation 
are embarrassingly parallel. 

5.2 Computing Norms in Z[X\/ (X n + 1) 

During instance generation we have to compute several norms of elements in 
Z[X\/(X n + 1). The norm Af(f) of an element / in Z[X\/(X n + 1) is equal to 
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the resultant res(/, X n + 1). The usual strategy for computing resultants over 
the integers is to use a multi-modular approach. That is, we compute resul- 
tants modulo many small primes qi and then combine the results using the 
Chinese Remainder Theorem. Resultants modulo a prime qi can be computed 
in 0(M(n)\ogn) operations where M(n) is the cost of one multiplication in 
Z q . [X]/(X n + 1). Hence, in our setting computing the norm costs G(nlog 2 n) 
operations without specialisation. 

However, we can observe that res(/, X n + 1) mod qi can be rewritten as 
ri(x n +i)0)=o /( x ) m °d qi as X n + 1 is monic, i.e. as evaluating / on all roots 
of X n + 1. Picking qi such that qi = 1 mod 2 n this can be accomplished using 
the NTT reducing the cost mod qi to 0(M(n)) saving a factor of logn, which 
in our case is typically >15. 


5.3 Checking if ( g ) is a Prime Ideal 

While we show in Sect. 4.1 that we do not necessarily require a prime (g), some 
applications might still rely on this property. We hence provide an implementa- 
tion for sampling such g. 

To check whether the ideal generated by g is prime in Z[X\/(X n + 1) we 
compute the norm Af(g) and check if it is prime which is a sufficient but not 
necessary condition. However, before computing full resultants, we first check if 
res(g,X n + 1) = 0 mod qi for several “interesting” primes q . These primes are 
2 and then all primes up to some bound with qi = 1 mod n because these occur 
with good probability as factors. We list timings in Table 3. 

Table 3. Average time of checking primality of a single (g) on Intel Xeon CPU E5-2667 
v2 3.30 GHz with 256 GB of RAM using 16 cores. 


n 

logo- 

wall time 

n 

logo- 

wall time 

n 

logo- 

wall time 

1024 

15.1 

0.54s 

2048 

16.2 

3.03 s 

4096 

17.3 

20.99 s 


5.4 Verifying that (b^b^) = ( g ) 

If re-randomisation elements are required, then it is necessary that they generate 
all of (g), i.e. (b^\ b^) = (g)- If b^ = • g for 0 < i < 2 then this condition 

is equivalent to (b^) + ( b = R. We check the sufficient but not necessary 
condition gcdfres (b^\x n + 1), Tes(b^\ X n + 1)) = 1, i.e. if the respective ideal 
norms are co-prime. This check, which we have to perform for every candidate 
pair involves computing two resultants and their gcd which is quite 

expensive. However, we observe that gcd(res(6^, X n + 1), res^ 1 ), X n + 1)) ^ 1 
when res(5^ 1 \ X n + 1) = 0 = res^ 1 ^ X n -\-l) mod qi for any modulus qi. Hence, 
we first check this condition for several “interesting” primes and resample if this 
condition holds. These “interesting” primes are the same as in the previous 
section. Only if these tests pass, we compute two full resultants and their gcd. 
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Indeed, after having ruled out small common prime factors it is quite unlikely 
that the gcd of the norms is not equal to one which means that with good 
probability we will perform this expensive step only once as a final verification. 
However, this step is still by far the most time consuming step during setup even 
with our optimisations applied. We note that a possible strategy for reducing 
setup time is to sample m > 2 re-randomisers and to apply some bounds on 
the probability of m elements sharing a prime factor (after excluding small 
prime factors). 


5.5 Computing the Inverse of a Polynomial Modulo X n + 1 

Instance generation relies on inversion in Q[X]/(X n + 1) in two places. Firstly, 
when sampling g we have to check that the norm of its inverse is bounded 
by l g . Secondly, to set up our discrete Gaussian samplers we need to run many 
inversions in an iterative process. We note that for computing the zero-testing 
parameter we only need to invert g in Z q [X]/(X n + 1) which can be realised in 
n inversions in 7L q in the NTT representation. 

In both cases where inversion in Q[X]/(X n + 1) is required approximate 
solutions are sufficient. In the first case we only need to estimate the size of g~ x 
and in the second case inversion is a subroutine of an approximation algorithm 
(see below). Hence, we implemented a variant of [BCMM98] to compute the 
approximate inverse of a polynomial in Q[X]/(X n + 1), with n a power of two. 

The core idea is similar to the FFT, i.e. to reduce the inversion of / to 
the inversion of an element of degree n/ 2. Indeed, since n is even, f(X) is 
invertible modulo X n + 1 if and only if /(— X) is also invertible. By setting 
F(X 2 ) = /(X)/(— X) mod X n + 1, the inverse / _1 (X) of /(X) satisfies 

F(X 2 )f-\X) = f(-X) (mod X n + 1). (6) 

Let f-\X) = g(X) = G e (X 2 ) + XG 0 (X 2 ) and f(-X) = F e (X 2 ) + XF 0 (X 2 ) 
be split into their even and odd parts respectively. From Eq. 6, we obtain F(X 2 ) 
(G e (X 2 )+XG 0 (X 2 )) = F e (X 2 )+XF 0 (X 2 ) (mod X n + 1) which is equivalent to 

J F(X 2 )G e {X 2 ) = F e (X 2 ) (mod X n + 1) 

\ F{X 2 )G 0 {X 2 ) = F 0 (X 2 ) (mod X™ + 1). 

From this, inverting f(X) can be done by inverting F(X 2 ) and multiplying 
polynomials of degree nj 2. It remains to recursively call the inversion of F(Y) 
modulo (X n / 2 + 1) (by setting Y = X 2 ). This leads to an algorithm for approx- 
imately inverting elements of Q[X]/(X n + 1) when n is a power of 2 which can 
be performed in 0(nlog 2 (n)) operations in Q. We give experimental results in 
Table 4. 

We give experimental results comparing Algorithm 1 with FLINT’S extended 
GCD algorithm in Table 4 which highlights that computing approximate inverses 
instead of exact inverses is necessary for anything but toy instances. 
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Algorithm 1 . Approximate inverse of f(X) mod X n + 1 using prec bits of 
precision 

if n — 1 then 

9o /o 

else 

F(X 2 ) «- f(X)f{-X) mod X" + 1 

F(Y) = F(Y) truncated to prec bits of precision 

G{Y) <— InverseMod(F(X), g, n/2) 

Set F e (X 2 ),F 0 (X 2 ) such that f(-X) = F e (X 2 ) + XF 0 (X 2 ) 

T e (Y), T 0 (Y) «- G(Y) ■ F e (Y), G(Y) ■ F 0 (Y) 
f~ 1 (X) T e (X 2 ) + XT 0 (X 2 ) 

/ -1 (X) » / _1 (A) truncated to prec bits of precision 
return f~ 1 (X) 
end if 


Table 4. Inverting g with FLINT’S extended Euclidean algorithm (“xgcd”), 

our implementation with precision 160 (“160”), iterating our implementation until 
||/ _1 (A)-/(A) || < 2 -160 (“160iter”) and our implementation without truncation ( “oo” ) 
on Intel Core i7-4850HQ CPU at 2.30 GHz, single core. 


n 

logo- 

xgcd 

160 

160iter 

oo 

4096 

17.2 

234.1s 

0.067s 

0.073 s 

121.8s 

8192 

18.3 

1476.8 s 

0.195s 

0.200 s 

755.8 s 


5.6 Small Remainders 

The Jigsaw Generator as defined in [GGH+13b, Definition 8] takes as input ele- 
ments ai in Z p where p = N(T) and produces level encodings with respect to 
some source group Si. In particular, this algorithm produces some small rep- 
resentative of the coset modulo (g) from large integers of size ~ ( cry/n) n if 
we represents elements in 7L V as integers 0 ^ ai < p. This can be accomplished 
by using Babai’s trick and that g is small, i.e. by computing — g • |_ g~ x • ai\ 
in Q[A]/(A n + 1). However, in order for this operation to produce sufficiently 
small elements, we need g~ x either exactly or with high precision. Computing 
such a high quality approximation of g~ l can be prohibitively expensive in terms 
of memory and time. Our strategy for computing with a lower precision is to 
rewrite cq as 

rios 2 ( a i)/B~\ 

di= 2 b ' j • 

3=0 

where aij < 2 B for some B. Then, we compute small representatives for all 2 s '- 7 
and ctij using an approximation of g~ l with precision B. Finally, we multiply the 
small representatives for 2 B j and a^- and add up their products. This produces 
a somewhat short element which we then reduce using our approximation of g~ x 
with precision B until its size does not decrease any more. 
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5.7 Sampling from a Discrete Gaussian 

While the strategy in Sect. 5.6 produces short elements it does not necessar- 
ily produce elements which follow a spherical Gaussian distribution and hence 
do not leak geometric information about g. To produce such samples we need 
to sample from the discrete Gaussian D^ a f^ c where c is a small representa- 
tive of a coset of (g). Furthermore, if encodings of zero are published, we are 
required to sample from D^) jO ./ j0 and For this, a fundamental building 

block is to sample from the integer lattice. We implemented a discrete Gaussian 
sampler over the integers both in arbitrary precision - using MPFR — and 
in double precision — using machine doubles. For both cases we implemented 
rejection sampling from a uniform distribution with and without table (“online”) 
lookups [GPV08] and Ducas et al’s sampler which samples from Dz^a 2 where 
<t 2 is a constant [DDLL13, Algorithm 12]. Our implementation automatically 
chooses the best algorithm based on cr, c and r (the tail cut). In our case a is 
typically relatively large, so we call the latter whenever sampling with a cen- 
tre cGZ and the former when c 0 Z. We list example timings of our discrete 
Gaussian sampler in Table 5. We note that in our implementation we — con- 
servatively — only make use of the arbitrary precision implementation of this 
sampler with precision 2A. 

Table 5. Example timings for discrete Gaussian sampling over Z on Intel Core i7— 
4850HQ CPU at 2.30 GHz, single core. 


Algorithm 

a 

c 

double 

mpfr_t 

prec 

samp ./s 

prec 

samp ./ s 

Tabulated [GPV08, SampleZ] 

10000 

1.0 

53 

660.000 

160 

310.000 

Tabulated [GPV08, SampleZ] 

10000 

0.5 

53 

650.000 

160 

260.000 

Online [GPV08, SampleZ] 

10000 

1.0 

53 

414.000 

160 

9.000 

Online [GPV08, SampleZ] 

10000 

0.5 

53 

414.000 

160 

9.000 

[DDLL13, Algorithm 12] 

10000 

1.0 

53 

350.000 

160 

123.000 


Using our discrete Gaussian sampler over the integers we implemented dis- 
crete Gaussian samplers over lattices. Implemented naively this takes 0(n 3 log n) 
operations even if we ignore issues of precision. Following [Ducl3], we imple- 
mented a variant of [Pei 10] which we reproduce in Algorithm 2. Namely, we 
first observe that = g • D R(7 ,. g -T and then use [PeilO, Algorithm 1] to 

sample from D R (y/ . g -T where g~ T is the conjugate of g ~ x . That is, g$ = go and 
gT~i = for 1 i < n for deg(g) = n — 1. We then proceed as follows. We 
first compute an approximate square root (see below) of U' 2 = g~ T • g~ x up to 
A bits of precision. We perform operations with log(n) + 4 (log(y^n || a ||)) bits 
of precision. If the square root does not converge for this precision, we double 
it and start over. We then use this value, scaled appropriately, as the initial 
value from which to start computing a square-root of £ 2 = &' 2 ■ g~ T • g~ x — r 2 
with r = 2 • f yiog n ] . We terminate when the square of the approximation 
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Algorithm 2. Computing an approximate square root of cr' 2 • g T • g 1 — r 2 . 
p, s' <— logn + 4 log(y/n || cr ||), 1 

^2 <- g~ T • g~ x 

while ||s /2 — E' 2 1| > 2 _A do 

s' computed at prec. p until ||s /2 — E 2 \\ < 2 _A or no more convergence 

p<-2p 

end while 

p, r ^ p + 2 log a ' , 2 • |" Vlog n\ 

E 2 ^a-g- T -g- 1 - r 2 

s computed at precision p using s' as initial approximation until ||s 2 — 

A7 2 || < 2“ 2A 

return s 


Algorithm 3. Sampling from D( g ^ a / 

^/cr ' 2 -s- T -p - 1 -r 2 
x G M n * — 3 Pi,o 

x <— x considered as an element G Q[X]/(X n + 1) 
y <— v^' • ^ 
return flr-(|_ 2 /|r) 


is within distance 2 _2A to £ 2 - This typically happens quickly because our initial 
candidate is already very close to the target value. 

Given an approximation a/X^ of \f £ 2 we then sample a vector x M n from 
a standard normal distribution and interpret it as a polynomial in Q[X]/(X n + l). 
We then compute y = \fE 2 • % in Q[X]/(X n + 1) and return g • ( Lz/l r- ) ? where 
\_y\ r denotes sampling a vector in Z n where the i-th component follows Dz, r ,yi- 
This algorithm is then easily extended to sample from arbitrary centres c. The 
whole algorithm is summarised in Algorithm 3 and we give experimental results 
in Table 6. 


5.8 Approximate Square Roots 

Our Gaussian sampler requires an (approximate) square root in Q[X]/(X n + 1). 
That is, for some input element £ we want to compute some element \f£ G 
Q[X]/(X n + 1) such that \\\/~£ • \[X — £\\ < 2 _2A . We use iterative methods 
as suggested in [Ducl3, Section 6.5] which iteratively refine the approximation 
of the square root similar to Newton’s method. Computing approximate square 
roots of matrices is a well studied research area with many algorithms known in 
the literature (cf. [Hig97]). All algorithms with global convergence invoke approx- 
imate inversions in Q[X]/(X n + 1) for which we call our inversion algorithm. 

We implemented the Babylonian method, the Denman-Beavers iteration 
[DB76] and the Pade iteration [Hig97]. Although the Babylonian method only 
involves one inversion which allows us to compute with lower precision, we used 
Denman-Beavers, since it converges faster in practice and can be parallelised 
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Table 6. Approximate square roots of U 2 = cr' 2 • g~ T • g — r 2 • I for discrete Gaussian 
sampling over g with parameter a' on Intel Core i7-4850HQ CPU at 2.30 GHz, 2 cores 
for Denman- Beavers, 4 cores for estimating the scaling factor, one core for sampling. 
The last column lists the rate (samples per second) of sampling from 


prec 

n 

log cr' 

Square root 

io g ||(^ , ) 2 -r 2 || 

D(g),er' / s 

Iterations 

Wall time 

160 

1024 

45.8 

9 

0.4s 

-200 

26.0 

160 

2048 

49.6 

9 

0.9s 

-221 

12.0 

160 

4096 

53.3 

10 

2.5s 

-239 

5.1 

160 

8192 

57.0 

10 

8.6s 

-253 

2.0 

160 

16384 

60.7 

10 

35.4 s 

-270 

0.8 


on two cores. While the Pade iteration can be parallelised on arbitrarily many 
cores, the workload on each core is much greater than in the Denman- Beavers 
iteration and in our experiments only improved on the latter when more than 8 
cores were used. 

Most algorithms have quadratic convergence but in practice this does not 
assure rapid convergence as error can take many iterations to become small 
enough for quadratic convergence to be observed. This effect can be mitigated, 
i.e. convergence improved, by scaling the operands appropriately in each loop 
iteration of the approximation [Hig97, Section 3]. A common scaling scheme is 
to scale by the determinant which in our case means computing res(/, X n + 1) 
for some / G Q[X]/(X n + 1). Computing resultants in Q[X]/(X n + 1) reduces to 
computing resultants in Z[X](X n + 1). As discussed above, computing resultants 
in 7L{X]/(X n + 1) can be expensive. However, since we are only interested in an 
approximation of the determinant for scaling, we can compute with reduced 
precision. For this, we clear all but the most significant bit for each coefficient’s 
numerator and denominator of / to produce f and compute res (/',X n + 1). 
The effect of clearing out the lower order bits of / is to reduce the size of the 
integer representation in order to speed up the resultant computation. With this 
optimisation scaling by an approximation of the determinant is both fast and 
precise enough to produce fast convergence. See Table 6 for timings. 
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1 Introduction 

The notion of key dependent message security [12] moves beyond our classical 
notion of encryption security [22]. It demands a system remain secure even if an 
attacker gains access to ciphertexts that encrypt messages that are, or depend on, 
the very private keys of the system it is trying to attack. As a concrete example, 
consider a special case of key-dependent security called n-circular security. Here 
an encryption scheme is said to be n-circular secure, if an adversary is unable 
to distinguish Enc (p£q, S&2), Encfj^, sks ), . . . , Enc(p£; n , ski) from corresponding 
zero encryptions. 

While the notion of key dependent or circular security might first appear 
to be just a technical exercise, this very problem arises in multiple contexts. 
Camenisch and Lysyanskaya [17] applied circular secure encryption to build an 
anonymous credentials scheme with certain properties. Other works used circular 
security in formal methods to prove the soundness of symbolic protocols [2,26]. 
Perhaps the most compelling example comes from Gentry [20] , who showed that 
a fully homomorphic scheme for limited depth can be “bootstrapped” to work 
for arbitrary depth circuits if the original system is sufficient to compute its own 
decryption circuit and is 1-circular secure. 

The first positive examples of key-dependent message security were given in 
the random oracle model by Black et al. [12] and Camenisch and Lysyanskaya 
[17]. It was a significant time later when Boneh, Hamburg, Halevi and Ostro- 
vsky [14] gave an elegant construction of an n-circular secure encryption in the 
standard model under the decision Difhe-Hellman assumption. Subsequently, a 
sequence of further works [5, 7-9, 15,16] gave standard model constructions of key 
dependent security for functions that could be arbitrary circuits on the private 
key(s). 

All the above constructions and proofs were based on encryption schemes 
with specific properties. A natural question is whether key-dependent message 
security is implied by IND-CPA (or IND-CCA) security. If this were true, we 
would get it for free, without needing such specific properties of the encryption 
scheme. 

A cursory examination of the problem shows that in the broadest sense the 
answer is no. One can derive a simple counterexample for 1-circular security (i.e., 
a system that encrypts its own private key) by slightly modifying a public key 
encryption system. To do so, simply augment a standard private key K with a 
randomly chosen K' G {0, 1} A and append y = f(K ' ) to the public key where / 
is a one way function. When encrypting a message m = (mi, m 2 ) the system will 
give out the message in the clear if /(m^) = y ) and encrypt normally otherwise. 
Clearly, an encryption of the private key will be detectable. Yet, if the function 
/ is one way and the original system is IND-CPA secure, the resulting system 
will still be IND-CPA secure. 

While it can be trivially shown (by the argument above) that IND-CPA secu- 
rity does not imply 1-circular security, the case for n > 2 becomes significantly 
more challenging. Intuitively, when multiple public keys are thrown into the mix, 
we need a system that is powerful enough to allow for different ciphertexts to 
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“talk” to each other in a manner that allows for cycle detection, but does not 
compromise IND-CPA security. So far there have been two approaches to this. 
For the case of n = 2, Acar et al. [1] and Cash, Green and Hohenberger [18] 
showed how to construct a counterexample from a certain class of asymmet- 
ric bilinear groups. 1 Here there must exist a bilinear map e : Gi x G 2 — > Gt 
where the decision Diffie- Heilman problem is believed to remain hard respec- 
tively within Gi and within G 2 (this is called the SXDH assumption). A second 
approach by Koppula, Ramchen and Waters [25] showed a counterexample under 
the assumption of indistinguishability obfuscation for poly-sized circuits. Inde- 
pendently and concurrently, Marcedone and Orlandi [27] showed this under the 
stronger assumption of virtual black box obfuscation. 

Our Goals and Results. In this work, we investigate new constructions of 
n-circular counterexamples with a focus on the case of n = 2. We have a partic- 
ular interest in what qualities a cryptosystem must have to be able to separate 
circular security from IND-CPA and IND-CCA security. 

To start, we ask whether there is something special about the asymmetry 
in bilinear groups that is inherent in the works of [1,18,34] or whether it is 
actually more the bilinearity that matters. As a further question, we explore 
how to derive such counterexamples from other assumptions such as the Learning 
with Errors (LWE) problem. If it were difficult to find such counterexamples, 
this might bolster are confidence in using 2-circular encryption as a method 
of bootstrapping [20] fully homomorphic encryption systems that are based on 
lattice assumptions. 

The results of this paper broadly expand the class of assumptions from which 
we can build 2-circular counterexamples. We first show for any constant k > 2 
how to build 2-circular counterexamples from a bilinear group under the deci- 
sion ^-linear assumption. Recall that the decision ^-linear assumption becomes 
progressively weaker as k becomes larger. This means that we can instantiate 
counterexamples from symmetric bilinear groups and shows that asymmetric 
groups do not have any inherently special property needed for this problem. 
We then show how to create 2-circular counterexamples from the Learning with 
Error (LWE) problem. This extends the reach of these systems beyond bilin- 
ear groups and obfuscation, giving us a much broader understanding of circular 
security and its challenges. 

Our Approach. We begin by introducing a new abstraction called an n-Cycle 
Tester that will simplify the process of finding and describing counterexam- 
ples by focusing on the core problem. A cycle tester consists of four algorithms 
(Setup, KeyGen, Enc, Test). The algorithms of Setup, KeyGen, Enc behave as in 
a normal encryption scheme with a common trusted setup algorithm, while 
the Test algorithm will take in an n-tuple of public keys and ciphertexts and 


1 In a similar vein, Rot hb lum [34] presented an elegant counterexample for bit- 
encryption under a generalization of the SXDH assumption applied to multilinear 
groups. 
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detect (with some non-negligible probability) the presence of a cycle. Notably 
absent is the inclusion of a decryption algorithm. Thus, a tester does not require 
that ciphertexts be decryptable in the traditional sense — it only matters that 
the Test algorithm work with some non-negligible probability. We found that 
relieving the responsibility of providing a system with decryption simplifies our 
constructions and allows us to focus on the main ideas. The security property 
required is IND-CPA security (recall that the basic IND-CPA game does not 
involve a decryption algorithm). 

Of course, to obtain a full-fledged counterexample of an encryption system we 
actually do need to provide an encryption system that decrypts. We show how to 
generically derive such a counterexample for n-circular encryption by combining 
a standard IND-CPA secure cryptosystem (of sufficient message length) with 
a n-cycle tester. The idea is fairly straightforward. The setup algorithm of the 
counterexample will run the respective setup algorithms of the encryption and 
cycle tester schemes. The public key is the pair of these public keys and the 
secret key is the pair of secret keys. To encrypt a message m = (mi, m 2 ), first 
encrypt m = (mi, m 2 ) under the regular encryption system, then encrypt just 
m 2 under the cycle tester. We can now see that: (1) the cycle tester will allow for 
any key cycle to be detected and (2) the standard encryption scheme can be used 
for decryption. A simple hybrid argument shows that the IND-CPA security of 
the standard encryption scheme and cycle tester imply IND-CPA security of the 
derived counterexample system. 

We also show that it is possible to extend this transformation idea to chosen 
ciphertext security, where we can combine any IND-CCA secure encryption sys- 
tem (of appropriate message length) with the same IND-CPA secure cycle tester 
to get an encryption system that is IND-CCA secure, but where encryption of 
key cycles can be detected. 

Again, the usefulness of this framework is its modularity. We show these basic 
transformations once in Sect. 4, and then for each construction we only need to 
focus on the basic cycle tester abstraction. 

A Cycle Tester from Asymmetric Bilinear Groups. As a baseline for our explo- 
ration (see [11] for the full details), we first create a 2-cycle tester from asym- 
metric groups using the SXDH assumption. Our construction is extracted from 
Cash et al. [18] (also similar to [1,34]), but simpler in that we only aim for the 
tester abstraction. 

In our construction, the Setup algorithm creates an asymmetric pairing 
description PP = (p, Gi, G 2 , Gt, e) of prime order p. It also produces gener- 
ators g E Gi and h E G 2 . The message space will be Z*. 

A key can be of one of two types. The cycle detection algorithm Test will 
work on any cycle of keys of two different types. The key generation algorithm 
Key Gen will first flip a coin (3 E {0, 1} to determine its type. It then picks a 
random key s E Z*. If /3 = 0, it sets its public key to be K = g s E Gi; otherwise, 
its public key is K = h s E G 2 . 

The encryption algorithm will choose a random exponent t E Z p and if 
the key is of type /3 = 0, it produces the ciphertext as (C\ = K tm = g strn , 
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C 2 = g 1 ) G G 2 3 ; otherwise if f3 = 1, it produces the ciphertext as (C\ = = 

h strn , C 2 = h l ) G G 2 . With ciphertexts of this form, the test algorithm follows 
straightforwardly. Suppose we had a pair of ciphertexts y = {C = (Ci, C 2 ), C' = 
(CijC^)) that encrypted a cycle for keys of different types. The algorithm can 

test this by simply computing ef^Ci^C^) = e(C 2 ,C[). Plugging in s, s' as the 
respective keys, t, t f as the encryption randomness, and m, ra' as the messages, 
we see that the test computes: 

e(g strn ,h t ')=e(g\h s,t '^). 

This equality holds if m = s' and m! = s and will not hold with high probability 
for a message independent of the private key. 

One thing we emphasize here is that IND-CPA is clearly broken if the SXDH 
assumption does not hold. Consider an encryption (C\ = K tm = g stm , C 2 = 
g*) G G 2 for the message m. The group elements g, {g s ) rn = g srn , C 2 = g f ,Ci = 
g strn clearly form a DDH tuple. So if DDH is easy in Gi, any /3 = 0 type key 
is susceptible to attack. An analogous statement holds in G 2 for any (3 = 1 key. 
This potential attack demonstrates that the above construction relies strongly 
on properties of asymmetric groups. We next show how to remove that reliance. 

A Cycle Tester from the Decision k- Linear Assumption. We next move to con- 
structing a cycle tester from the decision /c-linear assumption for any constant 
k > 2. Recall that the ^-linear assumption [24,35] is a parameterized family 
of assumptions on the source elements of bilinear groups. The assumption class 
becomes progressively weaker for larger values of k. Importantly, by moving 
to the decision ^-linear assumption we remove our dependence on asymmetric 
groups. 2 See [11] for a review. 

In our construction, the setup algorithm first generates a bilinear source 
group G of prime order p with generator g. Then it chooses a random invert- 
ible (rank k) matrix A G Z^ x/c and computes g A , which along with the group 
description forms the common public parameters. (We use the notation g m as 
shorthand for the set of group elements resulting from raising g to each matrix 
entry in M.) The message and key spaces are defined to be the set of rank k 
matrices in Z^ xk A 

Once again the key generation algorithm will flip a coin /3 to determine its 
type. Next it chooses a random W from the set of invertible matrices in Z kxk . 
If f3 = 0 the key is g AW ; otherwise it is g WA . 

2 We emphasize though that our constructions could use an asymmetric form of bilin- 
ear maps if desired, although we describe things in terms of symmetric groups. The 
main point is that there is no longer a reliance on asymmetry or that DDH is hard 
within each group. 

3 In our scheme, we actually let the message and key space be {0, 1} A for security 
parameter A and define a pseudorandom generator from this to rank k matrices. That 
way the message space is defined before the common setup is executed. However, for 
simplicity we will just assume here that the message and key spaces are the set of 
invertible k x k matrices. 
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The encryption algorithm takes as input a message M E Z kxk and then 
computes its inverse M _1 . (Recall the message space is the set of invertible 
matrices.) If the type bit /? = 0, the algorithm chooses a random row vector r 
of length k in 7L V (i.e. a random matrix of dimension 1 x k). The ciphertext is 
computed and output as C\ — g rAW , C 2 = g rAM \ Thus, the ciphertext will 
consist of two row vectors in the exponent. We observe all terms are computable 
from the public keys and public parameters. If the type bit /? = 1 the algorithm 
chooses a random column vector r of length k in 7L V (i.e., a random matrix of 
dimension k x 1). The ciphertext is computed and output as C\ = g WAr , C 2 = 

gM _1 Ar 

Now suppose we have two ciphertexts y = (C = (Ci, C 2 ), C' = C 2 )) of 

different types (with the first being of (3 = 0). We can then test for a cycle by 

testing if e(Ci, C’^) = e(C' 1: C 2 ). To see why, suppose we had a cycle, so we have 
that M /_1 = W -1 and M -1 = W /_1 . Then, in the exponent, it follows that: 

rAWM /_ 1 Ar / = rAM _ 1 W'Ar' 

v 

r AIAr' = rAIAr' 
rA 2 r' = rA 2 r'. 

So if there is a cycle, the test will output 1. In contrast, if the messages encrypted 
are independent of the key, the test will output 0 with high probability. 

Finally, we can give a simple proof of IND-CPA security from the decision k- 
linear assumption. More specifically, we will use the matrix /c-linear assumption, 
introduced by Naor and Segev [29], that was shown to be equivalent to the 
decision /^-linear assumption. Informally, the assumption says that it is hard to 
distinguish g x and g* where X is a random matrix of rank i > k and Y is a 
random matrix (of the same dimension) of rank j > k. I.e., the rank of matrices 
in the exponent cannot be determined as long as it is greater than k. For our 
purposes, we will be interested in using the difficulty of distinguishing between 
rank k and rank k + 1 matrices. 

Let us examine IND-CPA security for an encryption under a type [3 = 0 key. 
(The argument for f3 = 1 will follow analogously.) We will devise a reduction 
algorithm that receives a matrix /c-linear assumption challenge g M , where M 
is selected as either a random rank k matrix or rank k + 1 matrix. In the case 
where it is a rank k matrix, our reduction algorithm will use it to derive the key 
and ciphertext values of 

g A j 5 Aw > g rAW ,g rA - 

These can be used to generate a well-formed ciphertext of a given message. 
However, if the reduction algorithm receives a random matrix of rank k + 1, it 
will create key and ciphertext values distributed as 

g A 1 <? AW j g rAW , g uA . 

In this case the fact that u is fresh randomness will information-theoretically 
hide the message from the attacker. It then follows that any attacker with 
non-negligible advantage against our system must break the matrix ^-linear 
assumption. 
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In the full version [11], we present a different 2-cycle tester from the Decision 
Linear assumption in symmetric pairing groups. This construction can be viewed 
as closer to an extension of the SXDH one (sketched above and detailed in [11]) to 
symmetric groups where new variables and equations are introduced to prevent 
the use of pairings to disrupt IND-CPA security. However, it does not seem 
to generalize to a system that is secure using the decision ^-linear assumption 
for k > 2 or help move toward a Learning with Errors Assumption. At the 
same time, when compared to our more general construction just given for the 
k = 2 (decision linear assumption) case, it achieves smaller public keys. Public 
keys here are two group elements as opposed to four. Our techniques for this 
construction might be of future interest for other applications of transforming 
constructions proved under asymmetric group assumptions to those that do not 
rely on them. We defer further details of these techniques to the full version [11]. 

A Cycle Tester from Learning with Errors Assumption. While there are now 
many known examples of cryptographic functionalities that can be achieved in 
both the bilinear and lattice settings, it is not at all clear how to imitate the 
pairings-based approach above to obtain a cycle tester from the LWE assump- 
tion. Typically, encryption schemes proven secure under LWE have ciphertexts 
that are large, noisy vectors in Z™ and secret keys that are short vectors in Z m , 
with decryption computing a dot product and then removing the small effect 
of the noise multiplied by the short key vector. It seems unlikely that we could 
build a cycle tester using only this kind of structure, as the cycle effect would be 
obscured by the interactions of large ciphertext vectors with the embedded noise. 

Intuitively, we then expect that a cycle tester may use ciphertexts that have 
two parts: a noisy vector and a short vector. The large, noisy vectors will help us 
prove IND-CPA security from LWE, while the short vectors will help us perform 
the cycle test. Naturally, the main challenge is designing the relationship between 
the noisy and short vectors such that the short vectors do not break security 
when there is no cycle. 

The secret key for our scheme will generate a matrix B and a corresponding 
short trapdoor basis T#. For IND-CPA security, it is important that B is hidden, 
so one should ignore the notational collision and not think of this as correspond- 
ing to the public matrix A in an LWE challenge, but rather the columns of B will 
play the role of different hidden s vectors in typical LWE notation. The public 
key will be formed by choosing several random vectors c \ , . . . , q and publishing 
noisy versions of ciH, . . . , c^B as well as the (non-noisy) vectors ci, . . . , q (so 
these cf s can be thought of as playing the role of the public matrix A in an 
LWE challenge). 

To encrypt a message, the message will first be used to generate a matrix Z 
and a corresponding short trapdoor basis T^. The encryptor will mimic typical 
LWE- style encryption by forming a noisy version of sB for some vector s, but 
since it does not know H, it will form s as a linear combination of ci, . . . , q with 
coefficients chosen randomly from { — 1,1}. Note that the encryptor can then 
compute both s (without noise) and a noisy version of sB. The noisy version of 
sB becomes the noisy part of the ciphertext, and the other part of the ciphertext 
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is a short vector v such that Zv equals the transpose of s. Note that such a vector 
v can be sampled appropriately using the trapdoor basis Tz. 

For full details of how the cycle test works, see Sect. 6. The main idea is 
that when there is a 2-cycle, the secret key matrix B for one ciphertext is the 
same as the message matrix Z for the other ciphertext and vice versa. This 
leads to a common relationship between the short vector of one ciphertext and 
the noisy vector of the other, while when the H, Z matrices of each are fresh 
and unrelated, this relationship does not appear. One convenient feature of this 
scheme as compared to the bilinear schemes is that there is no need for different 
types of ciphertexts. Intuitively, the pairing relationship has been replaced by a 
dot product relationship between a short vector and a noisy one. 

Proving IND-CPA security for this scheme can be accomplished in a few 
steps. First, since B is hidden and its columns act like the hidden vector s in 
a typical LWE challenge and the Ci s act like rows of the public matrix A , we 
can argue that LWE implies the noisy public versions of ciB can be replaced 
by uniformly random vectors, independent of the Ci s and B. Next, using a 
convenient variant of the left over hash lemma from [3] , we argue that the random 
coefficients in { — 1, 1} that form s from the s and the noisy ciphertext vector 
from the public noisy vectors supply sufficient entropy to replace both of these 
with fresh uniformly random vectors as well. We are then left with an encryption 
that samples a uniformly random s (now independent of the noisy part of the 
ciphertext) and samples the short part of the ciphertext as a short vector v such 
that Zv is the transpose of s. Here we can argue that the distribution of such a v 
is statistically close to a distribution that is independent of Z: this follows from 
a result in [21] that ensures us that the image of a short, Gaussian distributed 
vector v under multiplication by Z is uniformly distributed in Z™. Thus, by 
employing LWE followed by a sequence of statistical arguments, we can arrive 
at a point where the ciphertext is independent of the message, and this implies 
IND-CPA security. 

Other Related Work. Haitner and Holenstein [23] show black box impossibility 
results for proving key-dependent message security from different cryptographic 
assumptions. Their goal deviates from ours in two important ways. First, their 
work focuses on impossibility results for ciphertext encrypting functions of its 
own private keys, whereas we are concerned with the circular case where there 
is a cycle over multiple private keys. Second, we are interested in concrete coun- 
terexamples. In particular, it may be possible that IND-CPA security implies 
certain key-dependent security properties even if there does not exist any black 
box reduction. In contrast our counterexamples will show that this is impossible 
if certain specific number theoretic assumptions hold. 

2 Preliminaries 

Background on pairings can be found in the full version [11]. 
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2.1 The k- LIN Assumption 

Decision Linear and the fc-LIN Family (k- LIN). We now present a family 
of assumptions called the &-LIN assumptions (where k = 1 is the standard DDH 
assumption and k = 2 is called Decision Linear [13]) [10,24]. Let G be a group 
of prime order p G 0(2 A ). For all p.p.t. adversaries A and k > 1, the following 
probability is 1/2 plus an amount negligible in A: 

Pr[s,Si,- •• ,9k <- ,r k <- Z p ;T 0 = g ( r i+-- + r *')-T 1 ^ G ;d<- {0,1}; 

d' <- A(g, gi, . . . , g k , gl 1 , ■ ■ ■ , g r k k ,T d ) : d = d’}. 

In the generic group model, these &-LIN assumptions become progressively 
weaker for increasing k. 

In our proof of security in Sect. 5 we will use a theorem due to Naor and 
Segev [29] that shows that under the decision ^-linear assumption no attacker 
can distinguish between a random rank i matrix and a random rank j matrix 
(in the exponent and of the same dimensions) for i, j > k. 

2.2 Lattices and LWE 

We let g, n, and m denote positive integers. Given a matrix A G Z™ xm , we let 
A ^j-(A) denote the lattice {x G Z m : Ax = 0 mod g}. For u G Z™, we let A q(A) 
denote the set {x G Z m : Ax = u mod q}. 

For a matrix A G Z nxm , we let ||A|| denote the £ 2 length of the longest 
column of A , and we let HAH^s denote ||A||, where A is the Gram-Schmidt 
orthogonalization of the columns of A. We let A f denote the transpose of the 
matrix A. 

Learning with Errors (LWE). Given integers n, m, a prime g, and a noise distri- 
bution x over Z, the (n, m, g, y)-LWE problem is to distinguish the distributions 
(. A : A t s + e) and (A, u), where A is chosen uniformly from Z^ xm , s is chosen 
uniformly from Z^ , e is chosen from y m , and u is chosen uniformly from Z^ 2 . 

Under a quantum reduction, Regev [33] showed that for certain noise dis- 
tributions, the LWE problem is as hard as the worst-case SIVP and GapSVR 
Peikert [31] gave a reduction in the classical setting. Our construction will admit 
a range of parameters where solving the LWE problem is as hard as approxi- 
mating the worst-case GapSVP to polynomial (in n) factors, which is believed 
to be computationally hard. 

Trapdoor Generation. We will rely on the polynomial time algorithm Trap- 
Gen(l n , l m ,g) (developed in [4,6,28]). This is a randomized algorithm that 
when given m = O(nlogg), outputs a full rank matrix A G Z^ xm and an 
accompanying basis Ta G Z mxm for A^-(A) such that the distribution of A is 
negligibly close (in n) to uniform over Z^ xm and ||Ta||g!s = 0(yjn logg) with 
all but negligible probability (as a function of n). 
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Discrete Gaussian Distributions. We employ the discrete Gaussian distribution 
T> a (T q (A)) on T q (A), parameterized by a > 0 (as defined e.g. in [33]). The salient 
fact we will use about this distribution is that for a random matrix A £ Z q xrn 
and a = Q(y/n), a vector sampled from V a (A q (A)) has £ 2 norm less than o^fm 
with probability at least 1 minus a quantity that is negligible in m. 

We will rely on a polynomial time algorithm SampleD(Al, Ta,u, a) [21]. This 
is a randomized algorithm that when a = ||Ta||gs * logm), produces a 
random vector x from a distribution that is statistically close to V a (A q (A)). 

We also employ the following result from [21] (appears as Corollary 5.4 in 
that work): 

Lemma 1. Let n and q be positive integers with q prime, and let m > 2nlogg. 
Then for all but a 2 q~ n fraction of all A £ Z q xrn and for any a > uj(^/\og m), 
the distribution of the syndrome u = Ae mod q is statistically close to uniform 
overZ™, where e is distributed according to 2\m ?cr . 

Randomness Extraction. We will use the leftover hash lemma (see [3] e.g. for an 
even stronger statement): 

Lemma 2. Suppose that £ > (j + 1) log q + u (log j) and q > 2 is prime (for 
integers q,j,£). Let R be an £ x 1 vector chosen uniformly in {1, — 1} £ mod q. 
Let A and B be matrices chosen uniformly in Z J q xi and Z J q xl respectively. Then, 
the distribution ( A,AR ) is statistically close to the distribution ( A,B ). 

3 Security Definitions 

In this work, we will focus on public key encryption schemes that admit a global 
setup algorithm. 

Definition 1 (Public Key Encryption). A public key encryption scheme 
LI = (Setup, KeyGen, Enc, Dec) for a message space M and secret key space S 4 
is a tuple of algorithms specified as follows: 

- Setup(l A ) — ► PP. The Setup algorithm takes as input the security parameter 
A and outputs common public parameters PP. 

- KeyGen(PP) — ► ( pk,sk ). The Key Generation algorithm takes as input the 
public parameters PP and outputs a public pk and secret key sk £ S. 

- Enc(pk,m £ M ) — ► C. The Encryption algorithm takes as input a public key 
pk and a message m £ M and outputs a ciphertext C . 

- Dec(s£;, C) — > m. The Decryption algorithm takes as input a secret key sk and 
a ciphertext C and outputs either an error message T or a value m £ M . 

By negl(fc) we denote some negligible function, i.e., one such that, for all 
c > 0 and all sufficiently large k , negl(fc) < l/k c . We abbreviate probabilistic 
polynomial time as PPT. 

4 Technically, the output of the Setup algorithm may be required to establish the 
message and secret key spaces. For instance, the setup algorithm may output a 
prime p and the message space might be set as Z*. For simplicity, we provide a 
name for these sets at the scheme level, even though the elements in these sets may 
not be defined until after Setup. 
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Perfect Correctness. An encryption scheme 77 = (Setup, KeyGen, Enc, Dec) for 
message space M is said to be perfectly correct if for all A G N, m G 717, and 
( pk,sk ) G KeyGen(Setup(l A )), it holds that Dec(sk, Enc(pk,m)) = m. 

Security. We recall the notion of indistinguishability of encryptions under a 
chosen-plaintext attack [22]. 

Definition 2 (IND-CPA Security). Let 77 = (Setup, KeyGen, Enc, Dec) be a 

public-key encryption scheme. For scheme II, adversary A, and A G N, let 
the random variable IND-CPA(77, A, A) be defined by the probabilistic algorithm 
described on the left side of Fig. 1. We denote the IND-CPA advantage of A by 
Adv c £ a A (\) = 2-Pr[IND-CPA(77, Al, A) = 1] — 1. We say that 77 is IND-CPA secure 
z/Adv^A) is negligible for all PPT A. 

We also consider the indistinguishability of encryptions under a chosen-ciphertext 
attack [19,30,32]. 

Definition 3 (IND-CCA Security). Let LI = (Setup, KeyGen, Enc, Dec) be a 

public-key encryption scheme. Let the random variable IND-CCA(77, A, A) be 
defined by an algorithm identical to IND-CPA(77, A, A) above, except that A has 
access to an oracle Dec(sk , •) that returns the output of the decryption algorithm 
and A cannot query this oracle on input y. We denote the IND-CCA advantage of 
A by Adv^^(A) = 2 • Pr[IND-CCA(77, A, A) = 1]-1. We say that 77 is IND-CCA 
secure if Adv^ a ^(A) is negligible for all PPT A. 


3.1 Circular Security 

We next define circular security of public-key encryption. This definition is 
derived from the Key-Dependent Message (KDM) security notion of Black et al. 
[12]. We follow prior counterexample definitions [1,18] which restrict the adver- 
sary’s power (e.g., cannot ask for any affine function of the secret keys). The 
adversary is asked to distinguish between an encryption cycle or encryptions of 
zero as in [14,18]. The bit string zero is not actually in the message spaces we 
consider, but this value can be encoded to be in the space; equivalently, one can 
follow the approach of Acar et al. [1] which instead of zero, encrypts a fresh 
random message. 

Definition 4 (IND-CIRC-CPA n ). Let LI = (Setup, KeyGen, Enc, Dec) be a public- 
key encryption scheme. For integer n > 0, scheme LI , adversary A and A G N ; 
let the random variable IND-CIRC-CPA n (77, A, A) be defined by the probabilistic 
algorithm in the middle of Fig. 1. We denote the IND-CIRC-CPA n advantage of 
A by 

Adv^ rc " cpa (A) = 2 • Pr[IND-CIRC-CPA n (77, A, A) = 1] - 1. 

We say that 77 is IND-CIRC-CPA n secure if Adv^'^ c ’ cpa (A) is negligible for all 

PPT A. 
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IND-CPA(i7, A, A) 

b +— { 0 , 1 } 

PP <- Setup(l A ) 

( pk , sk) <— KeyGen(PP) 
(mo, mi) <— A(pk) 
y <— En c(pk,mb) 
b <- A(y) 

Output (6 = b) 


IND-CIRC-CPA"(i7, A, A) 

6 A {0,1} 

PP e- Setup(l A ) 

For i = 1 to n: 

(pk^ski) <— KeyGen(PP) 
If b = 1 then 

y EncCycle(pk, sk) 
Else 

y EncZero(pk, sk) 

b <- *4(pk, y) 

Output (b = b) 


EncCycle(pk, sk) 

For i = 1 to n 

* ^(imod n) + l 

y% <— Enc (pk^rrn) 

Output y 

EncZero(pk, sk) 

For i = 1 to n 

fYlj i Ql s ^(imod n) + l I 

Vi •<- Enc (pk^rrii) 

Output y 


Fig. 1. Experiments for Definitions 2 and 4, each for a message space M, and we assume 
that mo, mi, ski E M. We write pk, sk, and y for (pk 1 , . . . ,pk n ), (ski , . . . , sk n ) and 
(2/1, • • • , 2 /n) respectively. 


Discussion. Cash et al. [18] made a distinction between whether an adversary 
could distinguish an encryption cycle from encryptions of zero (as in the standard 
game above), or whether an adversary could actually recover the secret keys 
(and provided the latter type of counterexample). Recently, Koppula et al. [25] 
showed that if there exists (an IND-CPA secure) scheme with a PPT adversary 
that can distinguish an encryption cycle (in the standard game), then it can 
be transformed into another scheme with a corresponding adversary that can 
extract the secret keys from the cycle. Thus, in this work, we can focus exclusively 
on the standard definition. 

4 A Framework for Generating Circular Counterexamples 

We now present a general framework for creating circular security counterexam- 
ples, which we will instantiate under a variety of different assumptions in the 
subsequent sections. At the center of our framework is an abstraction called a 
“cycle tester” . Like an encryption scheme, a cycle tester must be able to encode a 
message in an IND-CPA secure manner. However, unlike an encryption scheme, 
the cycle tester need not support a decryption operation, instead it must support 
a testing operation which can detect the presence of an encryption cycle. 

After formalizing this abstraction, we provide two results that use it. First, we 
show how our tester can be combined with any IND-CPA encryption scheme (of 
appropriate message length) to provide a full blown counterexample. Second, 
we extend this idea to show how to combine any tester with any IND-CCA 
encryption scheme to get an IND-CCA counterexample. 

In addition to letting us focus on a narrower primitive for our counterex- 
ample, this separation avoids duplication of work and minimizes assumptions. 
In particular, we can design a single tester and then both the IND-CPA and 
IND-CCA counterexamples follow. Most prior works did not address IND-CCA 
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counterexamples. While Cash et al. [18] did, their IND-CCA counterexample 
required the use of NIZKs, which is a stronger assumption than simply assum- 
ing the existence of IND-CCA encryption schemes as we do here. Our abstraction 
and transformation essentially show that designing IND-CCA counterexamples 
is no harder than designing IND-CPA ones. 

We remark that Koppula et al. [25] have a IND-CPA counterexample with 
structure similar to our general transformation, however, no generic or IND-CCA 
theorems are proven. 

Definition 5 (n-Cycle Tester). A cycle tester F = (Setup, KeyGen, Enc,Test) 

for message space M and secret key space S is a tuple of algorithms specified as 
follows: 

- Setup(l A ) — > PP. The Setup algorithm takes as input the security parameter 
A and outputs common public parameters PP. 

- KeyGen(PP) — ► ( pk,sk ). The Key Generation algorithm takes as input the 
public parameters PP and outputs a public key pk and secret key sk G S. 

- Enc (pk,m G M) — > C. The Encryption algorithm takes as input a public key 
pk and a message m G M and outputs a ciphertext C . 

- Test(pk, y) {0, 1}. On input pk = (pk 1: . . .,pk n ) and y = (Ci, . . . ,C n ), 
the Testing algorithm outputs a bit in {0, 1}. 

It also must possess the following properties. Let 77 = (Setup, KeyGen, Enc, •) be 
an encryption scheme formed from the first three algorithms of the tester with 
an empty decryption algorithm. Then, it must hold that: 

1. (IND-CPA security) 77 is IND-CPA secure according to Definition 2. 

2. (Testing Correctness) the Testing algorithm’s advantage in distinguishing 
encryption cycles, denoted Adv^" j^ cpa (A) from Definition f, is non-negligible. 

We now prove two theorems. 

Theorem 1 (CPA Counterexample from Cycle Testers). If there exists 
an IND-CPA-secure encryption scheme TI for message space M = (Mi x M 2 ) 
and secret key space S 1 C Mi and an n- cycle tester F for message space M 2 
and secret key space S 2 C M 2 , then there exists an IND-CPA-secure encryption 
scheme PI' for message space M = (Mi x M 2 ) and secret key space S = (Si x S 2 ) 
that is n- circular insecure. 

Proof. Let 77 = (Setupi, KeyGeni, Enci, Deci) and F = (Setup 2 , KeyGen 2 , Enc 2 , 
Test 2 ). We construct an IND-CPA 77' = (Setup, KeyGen, Enc, Dec), together with 
its IND-CIRC-CPA 2 test algorithm Test, as follows. 

Setup(l A ): On input 1 A , run PPi <— Setupi(l A ) and PP 2 <— Setup 2 (l A ). Output 
PP = (PPi,PP 2 ). 

KeyGen(PP): On input PP = (PPi,PP 2 ), run (pk 1 ,ski) KeyGeni(PPi) and 
( pk 2 ,sk 2 ) <— KeyGen 2 (PP 2 ). Output pk = (pk 1 ,pk 2 ) and sk = (s/q, s/q). 

Enc (pk,m): On input pk = (pk 1 ,pk 2 ) and m = (mi, m 2 ) G M, run c\ 
Enci(pfci, (mi, m 2 )) and C 2 Enc 2 (pA; 2 , m 2 ). Output C = (ci,C 2 ). 
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Dec (sk,C): On input sk = (ski,sk 2 ) and C = (01,02), output Deci(sfci, ci). 
Test(pk, y): On input pk = (pk u . . . , pk n ) and y = (Ci, . . . , C n ), parse pki = 
(a$, bi) and C* = (q, d$) and output the bit Test2((bi, . . . , b n ), (di, . . . , d n )). 

The correctness of Test follows directly from that of Test2- If (pk, y) con- 
tains an encryption cycle (or encryptions of zero, respectively), then so will 
((&i, . . . , b n ), (di, . . . , d n )), and thus by definition of the cycle tester, the test will 
distinguish between these cases with non-negligible advantage. 

It remains to argue that 77' is an IND-CPA secure encryption scheme. This 
follows by a simple hybrid argument based on the fact that an encryption in 77' 
is a pair of encryptions from two different IND-CPA-secure schemes, T and 77. 
We omit this proof as it is a simplified version of the IND-CCA proof that we 
provide next. 

Theorem 2 (CCA Counterexample from Cycle Testers). Let k,£ be secu- 
rity parameters and p(-) be a polynomial. If there exists an IND-CCA- secure 
encryption scheme 77 (with k-bit secret keys and (p(£) + 2k) -bit messages) and 
an n-cycle tester T (with k-bit secret keys , k-bit messages, and p(£)-bit cipher- 
texts), then there exists an IND-CCA-secure encryption scheme II' for 2k -bit 
messages that is n-circular insecure. 

Proof. Let 77 = (Setup 1? KeyGen 1 , Enc| s Deci) and T = (Setup 2 , KeyGen 2 , Enc2, 
Test2) with the length constraints above. We construct an IND-CCA 77' = 
(Setup, KeyGen, Enc, Dec), together with its IND-CIRC-CPA 2 test algorithm Test, 
as follows. We can no longer simply append the cycle-tester encryption to the 
regular encryption, because changes to the cycle-testing portion might be lever- 
aged to obtain a decryption of a portion of the challenge ciphertext. Instead, we 
encrypt this cycle-testing portion using the regular CCA-secure scheme. 

Setup(l A ): On input 1 A , run PPi Setup 1 (l A ) and PP2 Setup 2 (l A ). Output 
PP = (PP l5 PP 2 ). 

KeyGen(PP): On input PP = (PPi,PP 2 ), run (pk 1 ,ski) <— KeyGen 1 (PPi) and 
(pk 2 ,sk 2 ) <— KeyGen 2 (PP2). Output pk = (pk 1 ,pk 2 ) and sk = (s&i, sA^). 

Enc (pfc, (ra a , mb)): On input pk = (pk 1 ,pk 2 ) and message (m a ,m^) G {0, l} k x 
{0, l} fe , run C2 Enc2 {pk 2 ,mf) and c\ Enci(pfc 1 , (m a , m^, C2)). Output 
C = (ci,c 2 ). 

Dec (sk,C): On input sk = (s&i,^) and C = (ci,C2), run Deci(ski,ci). If 
it does not return a message of the form (m a ,m^,m c ) G {0, l} k x {0, l} k x 
{0, l} p O) or if me 7^ c 2 , then output _L (invalid ciphertext). Otherwise, output 
the message (m a ,mb) G {0, l} k x {0,l} fe . 

Test(pk, y): On input pk = (pfc 1? . . . , pk n ) and y = (Ci, . . . , C n ), parse pk i = 
{(Li, bi) and Q = (q, di) and output the bit Test2((bi, . . . , b n ), (di, . . . , d n )). 
Same as before. 

As before, the correctness of Test follows directly from that of Test 2 . If (pk, y) 
contains an encryption cycle (or encryptions of zero, respectively), then so will 
((61 , . . . , b n ), (di, . . . , d n )), and thus by definition of the cycle tester, the test will 
distinguish between these cases with non-negligible advantage. 
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4.1 Proving IND-CCA Security via a Sequence of Games 

It remains to argue that II' is an IND-CCA secure encryption scheme. This 
proof is significantly more involved than the IND-CPA case. We prove this using 
a sequence of games from an encryption of a message Mq to an encryption of 
Mi (where these messages come from the IND-CCA game). The public and 
secret keys are always distributed as in the real scheme, but the structure of 
the challenge ciphertext changes in each hybrid. We underline these changes for 
the reader. Let the challenge messages be described as Mo = and 

Mi = (rai^rai^). Then the hybrids are as follows: 

Game 1. This corresponds to the original security game IND-CCA(i7 / , A, A) 
in which the challenger interacts with adversary A, except that the challenge 
ciphertext is always an encryption of message Mo. 

1. Run Setup(l A ) to produce PP and then KeyGen(PP) to produce (pk, sk). 

2. On decryption query Ci from A, output Dec(sk, Ci). 

3. Provide the challenge ciphertext as C* = (cj,^), where c\ = Enci (pk x , 
(mo, a , 777-0,6,02)) and C2 = Enc2(p£;2, m o,&)- This is a valid encryption of Mo. 

4 . On decryption query Ci 7^ C* from A, output Dec (sk,Ci). 

Game 2. This is the same as Game 1, except that we change how the second 
decryption queries to reject all requests where the first portion of the query 
matches the first portion of the challenge. 

1. Run Setup(l A ) to produce PP and then KeyGen(PP) to produce (pk, sk). 

2. On decryption query Ci from A , output Dec (sk,Ci). 

3. Provide the challenge ciphertext as C* = (c*, c^), where c\ = Enci(pA; 1 , (777-0,0? 
mo, 6,^)) and c\ = Enc2(pA: 2 , mo, 5). This is a valid encryption of Mq. 

4 . On decryption query Cj = (q,i, c^) 7^ C* from A, if = c* output _L, oth- 
erwise output Dec(sk,Ci). 


Game 3. This is the same as Game 2, except that we now encrypt Mi in 
the cycle tester portion and continue to encrypt Mq in the regular encryption 
portion. We continue to reject all decryption queries where the regular encryption 
portion matches the challenge. 

1. Run Setup(l A ) to produce PP and then KeyGen(PP) to produce (pk, sk). 

2 . On decryption query Ci from A, output Dec (sk,Ci). 

3. Provide the challenge ciphertext as C* = (c*,c 2), where c* = Enci (pk 1 , (mo, a ? 
m 0 ,6? cl)) and C2 = Enc 2 (j7fc 2 ?mi i 6) . 

4 . On decryption query Ci = (<^,1,^2) 7^ G* from A, if c^i = c\ output _L, 
otherwise output Dec(sk,Ci). 

Game 4. This is the same as Game 3, except that now the entire challenge 
ciphertext is an encryption of Mi. As before, we continue to reject all decryption 
queries where the regular encryption portion matches the challenge. 
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1 . Run Setup(l A ) to produce PP and then KeyGen(PP) to produce ( pk,sk ). 

2 . On decryption query Q from A, output Dec (sk,Ci). 

3 . Provide the challenge ciphertext as C* = (c* , c 2 ), where = Enci(pfc 1; (mi )ffl , 
mp b ,c^)) and c* 2 = Enc 2 (pk 2 ,m hb ). 

4 . On decryption query Ci = (c^i,^) 7^ C* from A, if c*q = cf output _L, 
otherwise output Dec(s&,Ci). 

Game 5. This is the same as Game 4 , except now all decryption queries are 

answered as normal. The challenge ciphertext always contains an encryption of 

Mi. 

1 . Run Setup(l A ) to produce PP and then KeyGen(PP) to produce ( pk,sk ). 

2. On decryption query Ci from A, output Dec(sk, Ci). 

3 . Provide the challenge ciphertext as C* = (c*,^), where c\ = Enci(p£q, 
m^ b ,c 2 )) and c 2 = Enc 2 (pk 2 ,m lfb ). 

4. On decryption query Q 7 ^ C* from A, output Dec (sk,Ci). 


4.2 Adversary’s Probability of Outputting 1 in These Games 

Let Prob^ denote the probability that adversary A outputs a 1 in Game i. We 
will now show, by a series of steps, that for any adversary A the difference in 
its probability of outputting 1 between Game 1 (encryption of Mo) and Game 5 
(encryption of Mi) is negligible. Thus, it cannot distinguish between these two 
games. 

Claim. For any adversary A , Prob^ = Prob^. 

Proof. These games are identical except that in Game 1 all decryption queries 
Ci = C* are rejected whereas in Game 2 all decryption queries Ci = (<^1,^2) 
such that Qq = c\ for C* = (c*,^) are rejected. This results, however, in 
identical behavior on the decryption queries. Whenever c^i 7^ c*, both games 
answer the queries normally. Whenever Ci = ( 7 *, neither game answers this 
illegal challenge query. On c i: i = c\ but 2 7^ c 2 , Game 2 will output _L. 
However, Game l’s response is also to reject this query with the message _L for 
being a non- valid ciphertext, since the decryption of c\ results in an intermediate 
tuple of the form (mo^^o^,^) and the decryption algorithm checks that c 2 = 
which won’t be true in this case. Thus, the adversary gets identical responses 
to its decryption queries (and everything else) in both games. Since the games 
are identical, from the adversary’s viewpoint, it will output 1 with the same 
probability. 

Claim. If T is an IND- CPA- secure n-cycle tester with security parameter A, then 
for any adversary A , Prob^ — Prob^ < negl(A). 

Proof. We show that an attacker’s probability of outputting 1 cannot be non- 
negligibly different in Games 2 and 3 , because that would imply an attack on 
the IND-CPA security of the cycle tester. More formally, suppose there exists an 
adversary A such that Prob^ — Prob^ = e. Then we can construct an adversary 
B that uses A to show that T is not an IND-CPA- secure n-cycle tester. B works 
as follows: 
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1. B runs Setup 1 (l A ) — ► PPi and KeyGen 1 (PPi) — ► (pk^ski). 

2. B obtains the public key pk 2 from the IND-CPA encryption challenger. 

3. B sends pk = (pk 1: pk 2 ) to A. 

4. A returns two messages Mo = (mo,a,^o,6) and M\ — mi^). 

5. S sends (mo^, to the cycle tester encryption challenger and obtains the 
challenge c 2 . 

6. B forms the challenge ciphertext by computing c\ = Enci(pfc 1 , (mo, a , mo C 2 )) 
and sending C* = (cj,^) to A. 

7. Eventually, A returns a bit b and B outputs b to its challenger. 

In the above, B perfectly simulates Game 2 for adversary A if the challenge 
ciphertext C 2 contains an encryption of mo , 5 and, in the other case, B perfectly 
simulates Game 3 for adversary A when the challenge ciphertext C 2 contains 
an encryption of m\^. Moreover, B succeeds if and only if A succeeds. Thus, if 
Prob^ — Prob^ = e, then we have Pr[S is correct] = \ Pr[S is correct | IND-CPA 
challenger chose 0] + \ Pr [B is correct | IND-CPA challenger chose 1 ] = \ Pt[A 
is correct | Game 2] + \ Pt[A is correct | Game 3] = — Prob^) + ^(Prob^) 

= |(1 — Prob^) + ^(Prob^ + e) = | + |. Since we assumed the cycle tester was 
IND-CPA secure, it must hold that e < negl(A). 

Claim. If II is an IND- CCA- secure encryption scheme with security parameter 
A, then for any adversary A, Prob^ — Prob^ < negl(A). 

Proof. Suppose there exists an adversary A such that Prob^ — Prob^ = e. Then 
we can construct an adversary B that uses A to show that 77 is not an IND- 
CCA-secure encryption scheme. B works as follows: 

1. B obtains the public key pk x from the IND-CCA encryption challenger. 

2. B runs Setup 2 (l A ) — ► PP 2 and KeyGen 2 (PP 2 ) — > (pk 2 ,sk 2 )- 

3. B sends pk = (pk 1 ,pk 2 ) to A. 

4. On receiving a decryption query for ciphertext Ci = (q, 1 , 0 , 2 ) from A, B 
sends c^i to its IND-CCA encryption challenger to obtain a message M. B 
returns M to A. 

5. A returns two messages M 0 = (mo, a , ^ 0 , 5 ) and Mi = 

6. B computes C 2 = Enc 2 (pA: 2 , mi^) and sends Mq = (M 0 ,c 2 ) and M[ = (Mi, c 2 ) 
to the IND-CCA challenger and obtains the challenge c\. 

7. B sends the challenge ciphertext C* = (c\,c 2 ) to A. 

8 . On receiving a decryption query for ciphertext Ci = (cqi, 2 ) where 1 7 ^ 
c* from A, B sends c^i to its IND-CCA encryption challenger to obtain a 
message M. B returns M to A 

9. Eventually, A returns a bit b and B outputs b to its challenger. 

In the above, B perfectly simulates Game 3 for adversary A if the challenge 
ciphertext Ci contains an encryption of Mq and, in the other case, B perfectly 
simulates Game 4 for adversary A when the challenge ciphertext c* contains 
an encryption of M[. Moreover, B succeeds if and only if A succeeds. Thus, if 
Prob^ — Prob^ = e, then B’s probability of success in the IND-CCA security game 
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is Pt[B is correct] = \ Pr[23 is correct | IND-CCA challenger chose 0] + \ Pr [B is 
correct | IND-CCA challenger chose 1] = \ Pi[A is correct | Game 3] + \ Pr[*4 
is correct | Game 4] = |(1 — Prob^) + ^(Prob^) = |(1 — Prob^) + ^(Prob^ + e) 
= \ + |. Since we assumed that 77 was IND-CCA secure, it must hold that 
e < negl(A). 

Claim. For any adversary A, Prob^ = Prob^. 

Proof. These games are identical except that in Game 4 all decryption queries 
Ci = (ci,i,Ci,2) such that i = c\ for (7* = (c},^) are rejected in Game 5 
whereas all decryption queries Ci = (7* are rejected. This results, however, in 
identical behavior on the decryption queries. This case is the mirror image of 
the argument in the proof of Claim 4.2. 

Conclusion of the Proof of Theorem 2. Given the above claims, we can con- 
clude that if T is an IND-CPA-secure n-cycle tester and 77 is an IND-CCA- 
secure encryption scheme (with the appropriate length constraints), then for 
any adversary A, it holds that Prob^ — Prob^ is negligible, implying that PI' is 
an IND-CCA-secure encryption scheme. 


5 A 2-Cycle Tester from the fc-DLIN Assumption 


We now present a 2-cycle tester from the decision ^-Linear assumption in pairing 
groups for any constant k (where this assumption is believed to hold for k > 2 
in this bilinear setting and the assumption grows weaker as k increases) . We will 
use a message space of {0, 1} A . In our exposition we will use boldface to denote 
a matrix such as M. We also use g m as shorthand to denote the group elements 
corresponding to the raising g to each individual element of M. 

Setup(l A ) — > PP. The setup algorithm first runs !?(1 A ) to generate a (Type-1) 
group G of prime order p with generator g. Next it defines a pseudorandom 
generator PRG : {0, 1} A — » Z kxk , which maps strings from {0, 1} A to invertible 
k x k matrices over Z p . Finally, it chooses a random invertible matrix A E Z kxk 
and computes g A . The public parameters, PP consist of the group description 
G, the description of PRG and g A . 

KeyGen(PP) —> ( pk,sk ). The key generation algorithm first chooses random 
w E {0, 1} A . The secret key sk = w. Next, it computes PRG (w) — » W E 
and chooses a bit /? E {0, 1}. Finally, in addition to implicitly including PP, it 
defines the public key as 


pk = 


(0,77 = g AW ) e {0,1} x G kxk 
(1,77 = g WA ) E {0,1} x G kxk 


if /3 = 0; 


if p= 1. 


Enc (pk = (/?, 77), m E {0, 1} A ) — > ct. 

The encryption algorithm first computes computes PRG(m) — ► M G Zf* 
and then computes M 1 . Note that since PRG maps to invertible matrices, M 
will have an inverse. 
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If the type bit (3 = 0 the key K = g AW for some W. The algorithm chooses 
r as a random row vector of length k in Z p (i.e. a random matrix of dimension 
1 x k). The ciphertext is computed and output as 

ri _rAW ri _rAM -1 

= g , ^ 2=9 

Thus, the ciphertext will consist of two row vectors in the exponent. We observe 
all terms are computable from the public keys and public parameters. 

If the type bit (3 = 1 the key K = g WA for some W. The algorithm chooses r 
as a random column vector of length k in Z p (i.e. a random matrix of dimension 
k x 1). The ciphertext ct is computed and output as 

ry .WAr ri .M _1 Ar 

Oi = g , 02 = g 

Test(pk, y) — > {0, 1}. Since we are testing for 2-cycles, parse y = {C = (Ci, C 2 ), 
C' — (C(,C^)). If the key types are identical i.e. (3 = (3 r then just output a 
random bit as a guess. 

Otherwise, presume that (3 = 0, (3 r = 1 (if it is the other way around just 

? 

flip the order). Then compute e(Ci, C ' 2 ) = e(C(, C' 2 ) and output the result. Note 
here we overload notation so that the pairing operator e is over a matrix of group 
elements and means matrix multiplication in the exponent. (Or in this case a 
dot product in the exponent.) 

Analysis of Test Algorithm. We analyze the correctness of the test algorithm. 
Let’s consider two secret keys w,w' where PRG(re) = W and PRG (w') = W'. 
Again, presume that (3 = 0 ,/?' = 1. The corresponding public keys will be 
pk = g AW and pk = g w A . Now consider an encryption of m under pk and 
m! under pk' where PRG(m) = M and PRG(M') = M'. Let r and r' be the 

respective randomness used for each encryption. 

? 

The test equations outputs 1 iff e(Ci,C 2 ) = e(C^,C 2 ) this is equivalent to 
testing 

rAWM'-'Ar' = rAM _1 W'Ar'. (1) 

Let’s first consider the case where we have an encryption of a cycle. This 
means that m! — w and m — w’ so we have that M /_1 = W -1 and M -1 = 
W' -1 . Substituting these in we see that 

rAWM /_1 Ar' = rAM _1 W / Ar / 

? 

r AIAr' = r AIAr' 
rAV = rA 2 r'. 

Thus, on a cycle the test will output 1. 

We now turn to the case of showing that an encryption of O’s will output 0 
(when the keys have different (3 types) with all but negligible probability. 

First, we first let Z = PRG(0 A ) -1 which is the matrix used to encrypt the 
all O’s string. Second, we consider the probability of the tester outputting 1, 



New Circular Security Counterexamples from Decision Linear 795 


when W and W' are chosen uniformly at random (and independently from Z) 
from the set of full rank matrices, as opposed to being the output of a pseudo- 
random generator. If there, was more than a negligible difference of the test in 
outputting 1 in these two cases, it would lead to an attack on the security of the 
pseudorandom generator. 

We can now observe that the matrices X = AWZA and X' = AZW'A are 
distributed independently and uniformly random from full rank matrices. Note 
we substituted Z for both M /_1 and M -1 in Eq. 1. Then u = rX and u' = rX' 
are independently distributed as uniformly at random row vectors of length k. 
Finally, it follows that the probability that 


is negligible in the security parameter. Thus, with probability negligibly close to 
1 the test algorithm will output 0 when given an encryption of all 0 ’s. 
IND-CPA Security of the Tester 

Theorem 3. The above encryption scheme II = (KeyGen, Enc,Test) (where the 
decryption algorithm is ignored) is IND-CPA -secure under the k-Linear Assump- 
tion in G. 

The proof of this theorem can be found in the full version [11]. 

6 A 2-Cycle Tester from Learning with Errors 

We now present a 2-Cycle Tester whose IND-CPA security follows from the Learn- 
ing with Errors Assumption. We note that our construction is similar to multi-bit 
Regev encryption. 


6.1 Construction 

Setup(l n ) — > PP. The setup algorithm chooses m, g, f, cr, r, a. These parameters 
are chosen to satisfy the following constraints: m > 2 nlogg, a > Lcj(y/\ogm), 
q > 5<r(ra + 1), i > (n + m-|- 1) log q + cj(log(n + m)), r := erf, a < 1 /{ry/m + 1 • 
uj(yJ\ogn)), and q > 2 is prime. Here, L is defined as follows. We let 2 denote 
the number of uniform random bits employed by TrapGen to generate a matrix 
B in Z™ xm along with a trapdoor basis Tb- L is a bound such that ||Tb||g!s < L 
with overwhelming probability. (We note that this range of parameters allows 
us to set a so that n/a is polynomial, and LWE is believed to be hard in this 
parameter regime.) The public parameters are PP = (m, g, f, < 7 , r, <a, z). 

KeyGen(PP) — ► (pk,sk). The key generation algorithm chooses a uniformly 
random secret key sk in {0, 1} Z and runs TrapGen(sA;) to produce a matrix 
B E Z^ xm and a corresponding trapdoor basis Tb- It then chooses independent 
and uniformly random vectors C \ , . . . , q E Z™ and noise vectors 71 , . . . , 7 ^ from 
X m , where x is distributed as [qT c J mod g, where is a distribution on T of a 
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normal variable with mean 0 and standard deviation a/\/ 2 7r reduced modulo 1. 
(We think of these vectors as row vectors.) In addition to implicitly including 
the PP, it sets 

pk = {a , . . . ,C£,y! := ci# + 71, ...,y £ := c £ B + 7*}. 

Enc(p£;,ra G {0, 1 } Z ) — > cL The encryption algorithm runs TrapGen(ra) to pro- 
duce a matrix Z G Z™ xm and a corresponding trapdoor basis Tz. It chooses 

random signs ri, . . . ,77 G { — 1, 1} and computes s := X!i=i r i c i- U then uses Uz 
to sample a short (column) vector v such that Zv = by calling the algorithm 
SampleD. It computes C = r iHii and sets the ciphertext as ct = (C,v). 

Test((p& 0 , pAq), ((Co,7+), (Ci,fi))) — >• {0, 1}. The cycle test algorithm compares 
Co^i to Ci^o and checks if there are close modulo q (if their distance is < 2g/5). 
If so, it outputs 1. If not, it outputs 0. 

Analysis of Test Algorithm. We let be the B , Z and s values cor- 

responding to ciphertext (Co, Vo) and Bi,Zi,s\ be the analogous values for 
(Ci,i?i). When there is a cycle, we then have Z 0 = B\ and Z\ = Bq. We then 
have BqVi = s\ and BiVq = s q. Noting that Co = S0B0 + V’o for some small 
vector ipo, we see that 


CqVi = SqBqVi + 1 p 0 Vi = S 0 Sj + IpoVi. 

Similarly, Ci = s\B\ + ipi for some small vector ip 1, so we have that 
Civ 0 = siBiv 0 + V’lt’o = si 4 + V’l^o- 

We consider the size of ip 0^1 — V’Wo modulo q. First, \1poV1 \ is at most l times 
the maximal size of \^fjVi\. Using the same analysis as in the proof of Lemma 
8.2 of [21], each of these is < ^ with high probability. Thus, l^o^i — V’i+o | < ^ 
with high probability. 

Since all of vq , v \ , ipo , ipi arc short, this will cause these values to be close 
modulo g, so the cycle test will output 1 with high probability. 

When there is no cycle, the matrices Bo and B\ are (statistically close) to 
independent, uniformly random matrices. Thus the probability that soBqVi and 
SiB\Vo will be within | q modulo q is negligibly close to |. Thus the cycle test 
wins the distinguishing game with probability negligibly close to \ \ • | = |. 


6.2 IND-CPA Security of the Tester 

To prove that this construction satisfies IND-CPA, we define a sequence of secu- 
rity games. 

Game 0 This is the regular IND-CPA security game for our construction: 

1. The challenger runs Setup(l n ) — ► PP = (m, g, t, cr, r, <a, z). 
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2. The challenger chooses a uniformly random secret key sk in {0, 1} Z and runs 
TrapGen(s£;) to produce a matrix B E Z™ xm and a corresponding trap- 
door basis Tg. It then chooses independent and uniformly random vectors 
c\, ... ,cg E Z™ and noise vectors 71 , . . . , 7^ from x m . It sets 

pk = {ci, ...,ce,yi := dB + 71, ...,y e := c e B + 7^}. 

The challenger gives the parameters PP and key pk to the attacker. 

3. A The attacker submits two messages mo, mi to the challenger. 

4. The challenger flips a coin b E {0, 1}. It runs TrapGen(mfr) to produce a 
matrix Z E Z™ xm and a corresponding trapdoor basis Tz. It chooses random 

signs ri, . . . , 77 E {—1, 1} and computes s := Yli=i r i°i • It then uses Tz to 
sample a short (column) vector v such that Zv = s*, by calling the algorithm 
SampleD. It computes C = Yl\=i r iVi > an( t se t s the ciphertext as (C,v). 

5. The attacker receives the challenge ciphertext. It then outputs a guess b' and 
wins if b' = b. 

Game 1 

2. The challenger chooses a uniformly random secret key sk in {0, 1} Z and 
runs TrapGen(s£;) to produce a matrix B E Z™ xm and a corresponding 
trapdoor basis Tg. It then chooses independent and uniformly random vec- 
tors C\, ... ,C£ E Z™ and uniformly random vectors yi, ... ,y^ E Zff . It sets 
pk = {c 1 ,...,c e ,y 1 ,...,y ( ). 

Game 2 

4. The challenger flips a coin b E {0, 1}. It runs TrapGen(mfr) to produce a 
matrix Z E Z^ xm and a corresponding trapdoor basis T^. It chooses s ran- 
domly in Z^. It then uses Tz to sample a short (column) vector v such that 
Zv = s f , by calling the algorithm SampleD. It chooses C randomly from Z^ 1 
and sets the ciphertext as (C,v). 

Games 

4. The challenger samples the vector v from It chooses C randomly from 

Z^ and sets the ciphertext as (C, v). 

At this point, the distribution of the ciphertext is independent of the message, 
and it is clear that no PPT adversary can obtain a non-zero advantage. 

Lemma 3. Under the LWE assumption for the noise distribution no PPT 
attacker can obtain a non-negligible difference in advantage between Gameo and 
Game 1 . 

Proof. We can collect the column vectors c\, ... ,c\ into a n x i matrix we call 
D. We can collect the row vectors y\, . . . , y m into a £ x m matrix we call Y and 
the row vectors 71 , . . . , 7^ into a £ x m matrix we call T. We can then write the 
public key as D, D l B + T. Since B is never published, each column of B is a 
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fresh, uniform vector in Z™, and therefore each column of D t B-\-Y is distributed 
as an LWE sample with D playing the role of the nxm matrix A and the column 
of B playing the role of the random vector s. By a hybrid argument over the 
columns, we can thus rely on LWE to change each yi to be uniformly distributed 
in Z™. 

Lemma 4. No PPT attacker can obtain a non-negligible difference in advantage 
between Game i and Game 2 . 

Proof. For this, we will argue that the distributions of 8,(7 in Gamei and Game 2 
are statistically close. This is a direct application of Lemma 2 with j set to be 
n + m. To see this, we consider the random signs ri, . . . , rg G { — 1, 1} as a column 
vector R of length L We then consider the (vertical) concatenation of s l and C l 
into a n + m length column vector. In Gamei, this is produced as MR , where M 
is a (n + m) x £ matrix formed by vertically concatenating D and Y l as defined 
in the proof of the previous lemma. Since the matrices D, Y are now uniformly 
chosen, replacing MR by a uniformly random (n + m) x 1 matrix (as in Game 2 ) 
is a statistically close distribution by Lemma 2. 

Lemma 5. No PPT attacker can obtain a non-negligible difference in advantage 
between Game 2 and Games . 

Proof We will argue that the distributions of v in Game 2 and Games are sta- 
tistically close. We first observe that in Game 2 , v is chosen so that Zv = for 
a uniformly random s that is now independent of the rest of the ciphertext. The 
distribution of v here produced by SampleD is statistically close to T>A s (z),a- 
Now by Lemma 1, if we consider the distribution the probability mass 

on the preimages of s f under the mapping Zv = s f is (up to a negligible statis- 
tical distance) the same for each s. Thus, the distribution of v in both Game 2 
and in Games is statistically close to Vz m ,a- 
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